# Multi-Agent Graph Similarity Validation Experiment
# Enhanced and Cleaned Version


## Objective
Test whether multiple agents independently build similar semantic graph structures when trained on ConceptNet data with injected false triples. This experiment validates the hypothesis of a universal theory of meaning that is self-reinforcing, rejects contradiction, and shows clear paths of reasoning.



## Experimental Design
- **Agents**: 5-10 independent agents
- **Data Source**: ConceptNet triples with randomly generated false triples injected
- **Training**: No filters, agents can ACCEPT/REJECT/REVIEW triples
- **Validation**: Measure graph similarity, structure, and false triple influence
- **Scalability**: Designed to scale from small experiments to full 3M dataset

## Key Hypothesis
If multiple agents build similar semantic structures without shared optimization or influence, this provides evidence for a fundamental theory of meaning.


# SECTION 1: IMPORTS AND CONFIGURATION

In [1]:
import pandas as pd
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import jaccard_score, adjusted_rand_score
from sklearn.cluster import KMeans
from scipy.spatial.distance import cosine, euclidean
from scipy.stats import pearsonr, spearmanr
import json
import pickle
import random
import time
from datetime import datetime, timedelta
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Set random seeds for reproducibility
np.random.seed(42)
random.seed(42)

print("Libraries imported successfully")
print(f"Experiment started at: {datetime.now()}")

# EXPERIMENTAL CONFIGURATION

CONFIG = {
    'experiment_name': 'multi_agent_validation_v1',
    'num_agents': 7,  # Start with 7 agents for comprehensive comparison
    'num_epochs': 10,  # Number of complete passes through the dataset
    'max_iterations': 100,  # Training iterations per epoch
    'false_triple_ratio': 0.1,  # 10% false triples injected
    'batch_size': 100,  # Triples per training batch
    'validation_threshold': 0.7,  # Agent validation confidence threshold
    'sample_size': 50_000,  # Initial sample from ConceptNet (scalable)
    'quality_threshold': 0.8,  # Minimum quality for triple acceptance
    'similarity_metrics': ['jaccard', 'weighted_jaccard', 'structural', 'semantic', 'path_based'],
    'save_checkpoints': True,
    'verbose': True,
    'epoch_verbose': True,  # Verbose reporting at epoch level
    'dataset_coverage_per_epoch': 0.8,  # Fraction of dataset to cover per epoch (for robustness)
    
    # === RESOLUTION CONFIGURATION ===
    'max_resolution_attempts_per_batch': 50,  # Maximum relations to attempt resolution per batch
    'top_weight_resolution_limit': 100,  # Only consider top N highest-weighted pending relations
    
    # === RELATION SUGGESTION CONFIGURATION ===
    'enable_relation_suggestions': True,  # Enable proactive relation suggestion
    'suggestion_frequency': 0.2,  # Frequency of suggesting new relations (0.0-1.0)
    'max_suggestions_per_batch': 15,  # Maximum number of suggestions to generate per batch
    'suggestion_confidence_threshold': 0.6,  # Minimum confidence for suggesting a relation
    'suggestion_diversity_factor': 0.3,  # Factor to encourage diverse relation types in suggestions
    'suggestion_graph_exploration_depth': 2,  # How many hops to explore for suggestions
    'min_graph_size_for_suggestions': 10,  # Minimum graph size before starting suggestions
    'suggestion_novelty_weight': 0.4,  # Weight for novelty in suggestion scoring
    'suggestion_semantic_weight': 0.6,  # Weight for semantic similarity in suggestion scoring
    
    # === ADAPTIVE TRAINING CONFIGURATION ===
    'adaptive_training_mode': False,  # Enable intelligent training mode switching
    'coverage_based_training': False,  # Try coverage-based training first
    'target_coverage': 0.9,  # 90% coverage target for coverage-based training
    'coverage_timeout_minutes': 15,  # Max time to spend on coverage-based training
    'coverage_max_iterations': 500,  # Max iterations for coverage-based training
    'fallback_to_epochs': True,  # Fall back to epoch-based if coverage fails
    'coverage_check_frequency': 10,  # Check coverage every N iterations
    'coverage_progress_threshold': 0.01,  # Minimum progress required every 50 iterations
}

# File paths
path_dir = r'C:\Users\erich\OneDrive\Documents\Python Projects\Semantica-Full-Reasoning-Chatbot\Data'
DATA_PATH = Path(path_dir) / 'Input'
OUTPUT_PATH = Path(path_dir) / 'Output'

print("Configuration:")
for key, value in CONFIG.items():
    print(f"  {key}: {value}")

Libraries imported successfully
Experiment started at: 2025-05-27 16:30:07.719483
Configuration:
  experiment_name: multi_agent_validation_v1
  num_agents: 7
  num_epochs: 10
  max_iterations: 100
  false_triple_ratio: 0.1
  batch_size: 100
  validation_threshold: 0.7
  sample_size: 50000
  quality_threshold: 0.8
  similarity_metrics: ['jaccard', 'weighted_jaccard', 'structural', 'semantic', 'path_based']
  save_checkpoints: True
  verbose: True
  epoch_verbose: True
  dataset_coverage_per_epoch: 0.8
  max_resolution_attempts_per_batch: 50
  top_weight_resolution_limit: 100
  enable_relation_suggestions: True
  suggestion_frequency: 0.2
  max_suggestions_per_batch: 15
  suggestion_confidence_threshold: 0.6
  suggestion_diversity_factor: 0.3
  suggestion_graph_exploration_depth: 2
  min_graph_size_for_suggestions: 10
  suggestion_novelty_weight: 0.4
  suggestion_semantic_weight: 0.6
  adaptive_training_mode: False
  coverage_based_training: False
  target_coverage: 0.9
  coverage_timeou


# SECTION 2: VALIDATION AGENT CLASS


In [2]:
class ValidationAgent:
    """
    Independent validation agent for semantic graph construction
    
    ENHANCED RESOLUTION LOGIC:
    - Try to validate a triple → if uncertain, goes to REVIEW
    - After each batch, attempt to resolve pending REVIEW relations (prioritized by weight)
    - Only attempt resolution on top N highest-weighted pending relations
    - Configurable limit on resolution attempts per batch
    - Each relation can only be reviewed maximum 5 times per epoch
    - If after 5 review attempts it's still unresolved → stays in REVIEW for rest of epoch (epoch locked)
    - At epoch end, reset both attempt counts and review counts for fresh integration
    
    RELATION SUGGESTION CAPABILITIES:
    - Proactively generate new relation suggestions based on current graph structure
    - Use graph exploration and pattern analysis to suggest likely relations
    - Track suggestion performance and accuracy over time
    - Configurable suggestion frequency and confidence thresholds
    """
    
    def __init__(self, agent_id, config):
        self.agent_id = agent_id
        self.config = config
        self.graph = nx.DiGraph()
        self.validation_history = []
        self.decision_log = {'ACCEPT': 0, 'REJECT': 0, 'REVIEW': 0, 'FORCED_DECISION': 0}
        self.quality_scores = []
        
        # Enhanced tracking for relation resolution
        self.relation_attempts = {}  # Track attempts per relation per batch cycle
        self.relation_review_count = {}  # Track total reviews per relation per epoch
        self.pending_relations = {}  # Store relations under review
        
        # === RELATION SUGGESTION TRACKING ===
        self.suggested_relations = {}  # Track suggested relations and their outcomes
        self.suggestion_history = []  # Track all suggestions made
        self.suggestions_accepted = 0  # Count of suggestions that were accepted
        self.suggestions_rejected = 0  # Count of suggestions that were rejected
        self.suggestion_scores = []  # Track suggestion confidence scores
        
        self.training_metrics = {
            'iterations_completed': 0,
            'epochs_completed': 0,
            'triples_processed': 0,
            'false_triples_detected': 0,
            'accuracy': 0.0,
            'epoch_accuracies': [],  # Track accuracy across epochs
            'epoch_nodes': [],       # Track graph size across epochs
            'epoch_edges': [],
            'relations_resolved': 0,  # Track successful resolutions
            'forced_decisions': 0,    # Track forced decisions
            'resolution_attempts_per_batch': [],  # Track resolution workload per batch
            'top_weight_resolutions': 0,  # Track how many top-weight relations were resolved
            
            # === SUGGESTION METRICS ===
            'suggestions_generated': 0,  # Total suggestions generated
            'suggestions_accepted': 0,   # Suggestions that were validated and accepted
            'suggestions_rejected': 0,   # Suggestions that were validated and rejected
            'suggestion_accuracy': 0.0,  # Accuracy of suggestion validation
            'avg_suggestion_confidence': 0.0,  # Average confidence of suggestions
            'suggestion_diversity': 0.0,  # Diversity of suggested relation types
        }
        
    def calculate_validation_score(self, triple, edge_weight=1.0):
        """Calculate validation score for a triple based on existing graph context"""
        subj, rel, obj = triple
        
        # Base score influenced by edge weight
        score = 0.4 + (edge_weight * 0.1)  # Higher weights get slightly higher base scores
        
        # Check for existing relationships
        if self.graph.has_node(subj) and self.graph.has_node(obj):
            # Check for direct connection
            if self.graph.has_edge(subj, obj):
                existing_rel = self.graph[subj][obj].get('relation', '')
                existing_weight = self.graph[subj][obj].get('weight', 1.0)
                
                if existing_rel == rel:
                    # Strengthen score based on weight consistency
                    weight_consistency = 1 - abs(existing_weight - edge_weight) / max(existing_weight, edge_weight)
                    score += 0.3 * weight_consistency
                else:
                    score -= 0.1  # Potential contradiction
            
            # Check for semantic consistency
            subj_neighbors = set(self.graph.neighbors(subj))
            obj_neighbors = set(self.graph.neighbors(obj))
            common_neighbors = len(subj_neighbors.intersection(obj_neighbors))
            
            if common_neighbors > 0:
                score += min(0.2, common_neighbors * 0.05)
        
        # Edge weight influence on validation
        if edge_weight >= 0.8:
            score += 0.1  # High confidence triples get bonus
        elif edge_weight <= 0.3:
            score -= 0.1  # Low confidence triples get penalty
        
        # Add noise for realism
        score += np.random.normal(0, 0.05)
        
        return max(0.0, min(1.0, score))
    
    def validate_triple(self, triple, edge_weight=1.0, is_false=False, force_decision=False):
        """Enhanced validate triple with proper epoch locking and resolution forcing logic"""
        triple_key = str(triple)
        
        # Initialize tracking for new relations
        if triple_key not in self.relation_attempts:
            self.relation_attempts[triple_key] = 0
            self.relation_review_count[triple_key] = 0
        
        # Check if this relation is epoch-locked (exceeded 5 reviews this epoch)
        if triple_key in self.pending_relations and self.pending_relations[triple_key].get('epoch_locked', False):
            # Return REVIEW and don't process further this epoch
            self.decision_log['REVIEW'] += 1
            
            self.validation_history.append({
                'triple': triple,
                'edge_weight': edge_weight,
                'score': 0.5,  # Neutral score for locked relations
                'decision': 'REVIEW',
                'is_false': is_false,
                'correct': None,  # Cannot determine correctness for locked relations
                'attempt_number': self.relation_attempts[triple_key] + 1,
                'total_reviews': self.relation_review_count[triple_key],
                'forced': False,
                'epoch_locked': True,
                'epoch': self.training_metrics['epochs_completed'],
                'iteration': self.training_metrics['iterations_completed']
            })
            
            return 'REVIEW', 0.5
        
        # Calculate base validation score
        score = self.calculate_validation_score(triple, edge_weight)
        
        # Enhanced decision logic with forced resolution
        quality_threshold = self.config['quality_threshold']
        
        # Adjust thresholds based on edge weight
        if edge_weight >= 0.8:
            accept_threshold = quality_threshold - 0.1  # Lower threshold for high-weight triples
            reject_threshold = 0.3
        elif edge_weight <= 0.3:
            accept_threshold = quality_threshold + 0.1  # Higher threshold for low-weight triples
            reject_threshold = 0.4
        else:
            accept_threshold = quality_threshold
            reject_threshold = 0.3
        
        # Check if we need to force a decision
        attempts = self.relation_attempts[triple_key]
        total_reviews = self.relation_review_count[triple_key]
        
        if force_decision or attempts >= 3 or total_reviews >= 5:
            # Force a binary decision (ACCEPT or REJECT only)
            if score >= (accept_threshold + reject_threshold) / 2:
                decision = 'ACCEPT'
            else:
                decision = 'REJECT'
            
            if attempts >= 3 or total_reviews >= 5:
                self.decision_log['FORCED_DECISION'] += 1
                self.training_metrics['forced_decisions'] += 1
                
        else:
            # Normal decision logic with review capability
            if score >= accept_threshold:
                decision = 'ACCEPT'
            elif score <= reject_threshold:
                decision = 'REJECT'
            else:
                decision = 'REVIEW'
                self.relation_review_count[triple_key] += 1
                
                # Calculate priority weight for pending relation
                priority_weight = self._calculate_priority_weight(triple, edge_weight, score)
                
                # Add to pending relations or update existing entry
                self.pending_relations[triple_key] = {
                    'triple': triple,
                    'edge_weight': edge_weight,
                    'is_false': is_false,
                    'attempts': attempts,
                    'epoch_locked': False,  # Initialize as not locked
                    'priority_weight': priority_weight,  # Add priority scoring
                    'last_score': score  # Track latest validation score
                }
                
                # Check if this relation should be epoch-locked
                if self.relation_review_count[triple_key] >= 5:
                    self.pending_relations[triple_key]['epoch_locked'] = True
        
        # Update attempt counter for this relation
        self.relation_attempts[triple_key] += 1
        
        # Track performance
        self.decision_log[decision] += 1
        correct = (decision == 'REJECT') if is_false else (decision == 'ACCEPT')
        
        if decision in ['ACCEPT', 'REJECT']:
            self.training_metrics['relations_resolved'] += 1
            # Remove from pending if resolved
            if triple_key in self.pending_relations:
                del self.pending_relations[triple_key]
        
        self.validation_history.append({
            'triple': triple,
            'edge_weight': edge_weight,
            'score': score,
            'decision': decision,
            'is_false': is_false,
            'correct': correct,
            'attempt_number': attempts + 1,
            'total_reviews': self.relation_review_count[triple_key],
            'forced': attempts >= 3 or total_reviews >= 5,
            'epoch_locked': self.pending_relations.get(triple_key, {}).get('epoch_locked', False),
            'epoch': self.training_metrics['epochs_completed'],
            'iteration': self.training_metrics['iterations_completed']
        })
        
        return decision, score
    
    def _calculate_priority_weight(self, triple, edge_weight, validation_score):
        """Calculate priority weight for pending relations based on multiple factors"""
        # Base weight from edge weight (30% influence)
        priority = edge_weight * 0.3
        
        # Validation score influence (40% influence)
        priority += validation_score * 0.4
        
        # Graph connectivity influence (20% influence)
        subj, rel, obj = triple
        connectivity_score = 0
        
        if self.graph.has_node(subj):
            connectivity_score += min(0.1, self.graph.degree(subj) * 0.01)
        if self.graph.has_node(obj):
            connectivity_score += min(0.1, self.graph.degree(obj) * 0.01)
            
        priority += connectivity_score * 0.2
        
        # Novelty bonus (10% influence) - prefer new concepts
        novelty_bonus = 0
        if not self.graph.has_node(subj):
            novelty_bonus += 0.05
        if not self.graph.has_node(obj):
            novelty_bonus += 0.05
            
        priority += novelty_bonus * 0.1
        
        # Add small random component for tie-breaking
        priority += np.random.uniform(0, 0.01)
        
        return max(0.0, min(1.0, priority))
    
    def generate_relation_suggestions(self):
        """Generate new relation suggestions based on current graph structure"""
        if not self.config.get('enable_relation_suggestions', False):
            return []
            
        # Don't suggest if graph is too small
        if self.graph.number_of_nodes() < self.config.get('min_graph_size_for_suggestions', 10):
            return []
            
        # Check if we should generate suggestions based on frequency
        if np.random.random() > self.config.get('suggestion_frequency', 0.2):
            return []
            
        suggestions = []
        max_suggestions = self.config.get('max_suggestions_per_batch', 15)
        confidence_threshold = self.config.get('suggestion_confidence_threshold', 0.6)
        exploration_depth = self.config.get('suggestion_graph_exploration_depth', 2)
        
        # Get all nodes for potential suggestion candidates
        nodes = list(self.graph.nodes())
        if len(nodes) < 2:
            return []
            
        # Get existing relation types to encourage diversity
        existing_relations = set()
        for _, _, data in self.graph.edges(data=True):
            if 'relation' in data:
                existing_relations.add(data['relation'])
                
        # Sample nodes for suggestion exploration
        sampled_nodes = np.random.choice(nodes, min(20, len(nodes)), replace=False)
        
        for node in sampled_nodes:
            if len(suggestions) >= max_suggestions:
                break
                
            # Find potential relation targets through graph exploration
            potential_targets = self._explore_graph_for_suggestions(node, exploration_depth)
            
            for target_node, suggested_relation, confidence in potential_targets:
                if len(suggestions) >= max_suggestions:
                    break
                    
                if confidence >= confidence_threshold:
                    # Check if this relation already exists
                    if not self.graph.has_edge(node, target_node):
                        # Calculate suggestion priority
                        priority = self._calculate_suggestion_priority(
                            node, target_node, suggested_relation, confidence, existing_relations
                        )
                        
                        suggestion = {
                            'triple': (node, suggested_relation, target_node),
                            'confidence': confidence,
                            'priority': priority,
                            'suggested_weight': min(0.9, confidence + 0.1),  # Convert confidence to weight
                            'source': 'graph_exploration',
                            'exploration_depth': exploration_depth
                        }
                        
                        suggestions.append(suggestion)
                        
        # Sort suggestions by priority and return top ones
        suggestions.sort(key=lambda x: x['priority'], reverse=True)
        return suggestions[:max_suggestions]
    
    def _explore_graph_for_suggestions(self, start_node, max_depth):
        """Explore graph to find potential relation suggestions"""
        suggestions = []
        visited = {start_node}
        
        # BFS exploration from start_node
        current_level = [start_node]
        
        for depth in range(max_depth):
            next_level = []
            
            for node in current_level:
                # Explore neighbors
                for neighbor in self.graph.neighbors(node):
                    if neighbor not in visited:
                        visited.add(neighbor)
                        next_level.append(neighbor)
                        
                        # Suggest potential relations based on path analysis
                        relation_suggestions = self._analyze_path_for_relations(start_node, neighbor, depth + 1)
                        suggestions.extend(relation_suggestions)
                        
            current_level = next_level
            if not current_level:  # No more nodes to explore
                break
                
        return suggestions
    
    def _analyze_path_for_relations(self, source, target, path_length):
        """Analyze potential relations between source and target based on graph patterns"""
        suggestions = []
        
        # Get relation types from the graph
        graph_relations = set()
        for _, _, data in self.graph.edges(data=True):
            if 'relation' in data:
                graph_relations.add(data['relation'])
                
        if not graph_relations:
            return suggestions
            
        # Analyze common neighbors for relation inference
        source_neighbors = set(self.graph.neighbors(source))
        target_neighbors = set(self.graph.neighbors(target))
        common_neighbors = source_neighbors.intersection(target_neighbors)
        
        # Base confidence starts lower for longer paths
        base_confidence = max(0.3, 0.8 - (path_length * 0.15))
        
        # Suggest relations based on common neighbor patterns
        for relation in graph_relations:
            confidence = base_confidence
            
            # Boost confidence if there are common neighbors with this relation
            relation_boost = 0
            for neighbor in common_neighbors:
                if self.graph.has_edge(source, neighbor):
                    edge_data = self.graph[source][neighbor]
                    if edge_data.get('relation') == relation:
                        relation_boost += 0.1
                if self.graph.has_edge(target, neighbor):
                    edge_data = self.graph[target][neighbor]
                    if edge_data.get('relation') == relation:
                        relation_boost += 0.1
                        
            confidence += min(0.3, relation_boost)
            
            # Add some randomness for exploration
            confidence += np.random.uniform(-0.05, 0.05)
            
            if confidence > 0.4:  # Only suggest if reasonably confident
                suggestions.append((target, relation, confidence))
                
        return suggestions
    
    def _calculate_suggestion_priority(self, source, target, relation, confidence, existing_relations):
        """Calculate priority score for a relation suggestion"""
        priority = 0.0
        
        # Base priority from confidence
        priority += confidence * self.config.get('suggestion_semantic_weight', 0.6)
        
        # Novelty bonus for new relation types
        if relation not in existing_relations:
            priority += self.config.get('suggestion_novelty_weight', 0.4) * 0.5
            
        # Diversity factor - encourage different relation types
        diversity_factor = self.config.get('suggestion_diversity_factor', 0.3)
        relation_frequency = sum(1 for _, _, d in self.graph.edges(data=True) 
                               if d.get('relation') == relation)
        total_edges = self.graph.number_of_edges()
        
        if total_edges > 0:
            relation_rarity = 1 - (relation_frequency / total_edges)
            priority += diversity_factor * relation_rarity
            
        # Structural importance - nodes with higher degree get priority
        source_degree = self.graph.degree(source)
        target_degree = self.graph.degree(target)
        degree_factor = min(0.2, (source_degree + target_degree) * 0.01)
        priority += degree_factor
        
        # Add small random component for tie-breaking
        priority += np.random.uniform(0, 0.02)
        
        return max(0.0, min(1.0, priority))
    
    def process_suggested_relations(self, suggestions):
        """Process and validate suggested relations"""
        if not suggestions:
            return 0
            
        processed_count = 0
        
        for suggestion in suggestions:
            triple = suggestion['triple']
            confidence = suggestion['confidence']
            suggested_weight = suggestion['suggested_weight']
            
            # Track the suggestion
            suggestion_key = str(triple)
            self.suggested_relations[suggestion_key] = {
                'suggestion': suggestion,
                'processed': True,
                'timestamp': self.training_metrics['iterations_completed']
            }
            
            # Validate the suggested relation
            decision, score = self.validate_triple(triple, suggested_weight, is_false=False)
            
            # Track suggestion outcome
            suggestion_record = {
                'triple': triple,
                'confidence': confidence,
                'decision': decision,
                'score': score,
                'priority': suggestion['priority'],
                'source': suggestion['source'],
                'iteration': self.training_metrics['iterations_completed'],
                'epoch': self.training_metrics['epochs_completed']
            }
            
            self.suggestion_history.append(suggestion_record)
            
            if decision == 'ACCEPT':
                self.suggestions_accepted += 1
                self.training_metrics['suggestions_accepted'] += 1
                self.add_triple_to_graph(triple, suggested_weight)
            elif decision == 'REJECT':
                self.suggestions_rejected += 1
                self.training_metrics['suggestions_rejected'] += 1
                
            self.training_metrics['suggestions_generated'] += 1
            self.suggestion_scores.append(confidence)
            processed_count += 1
            
        return processed_count
    
    def update_suggestion_metrics(self):
        """Update suggestion-related metrics"""
        total_suggestions = self.training_metrics['suggestions_generated']
        
        if total_suggestions > 0:
            # Calculate suggestion accuracy
            total_validated = (self.training_metrics['suggestions_accepted'] + 
                             self.training_metrics['suggestions_rejected'])
            if total_validated > 0:
                self.training_metrics['suggestion_accuracy'] = (
                    self.training_metrics['suggestions_accepted'] / total_validated
                )
                
            # Calculate average suggestion confidence
            if self.suggestion_scores:
                self.training_metrics['avg_suggestion_confidence'] = np.mean(self.suggestion_scores)
                
            # Calculate suggestion diversity (unique relation types suggested)
            if self.suggestion_history:
                suggested_relation_types = set()
                for record in self.suggestion_history:
                    suggested_relation_types.add(record['triple'][1])
                self.training_metrics['suggestion_diversity'] = len(suggested_relation_types)
    
    def process_pending_relations(self):
        """Enhanced process relations with prioritization and configurable batch limits"""
        if not self.pending_relations:
            return 0
            
        resolved_relations = []
        max_attempts = self.config.get('max_resolution_attempts_per_batch', 50)
        top_limit = self.config.get('top_weight_resolution_limit', 100)
        
        # Filter out epoch-locked relations
        available_relations = {
            key: data for key, data in self.pending_relations.items() 
            if not data.get('epoch_locked', False)
        }
        
        if not available_relations:
            return 0
        
        # Sort relations by priority weight (highest first)
        sorted_relations = sorted(
            available_relations.items(),
            key=lambda x: x[1].get('priority_weight', 0.5),
            reverse=True
        )
        
        # Take only top N highest-weighted pending relations
        top_relations = sorted_relations[:top_limit]
        
        # Process up to max_attempts relations this batch
        attempts_this_batch = 0
        
        for triple_key, relation_data in top_relations:
            if attempts_this_batch >= max_attempts:
                break
                
            # Check if this relation has already been attempted 3 times this batch cycle
            if self.relation_attempts.get(triple_key, 0) >= 3:
                continue
                
            triple = relation_data['triple']
            edge_weight = relation_data['edge_weight']
            is_false = relation_data['is_false']
            
            # Try to resolve the pending relation
            decision, score = self.validate_triple(triple, edge_weight, is_false)
            attempts_this_batch += 1
            
            if decision in ['ACCEPT', 'REJECT']:
                resolved_relations.append(triple_key)
                self.training_metrics['top_weight_resolutions'] += 1
                
                # Add to graph if accepted
                if decision == 'ACCEPT':
                    self.add_triple_to_graph(triple, edge_weight)
        
        # Track resolution attempts per batch
        self.training_metrics['resolution_attempts_per_batch'].append(attempts_this_batch)
        
        return len(resolved_relations)
    
    def add_triple_to_graph(self, triple, edge_weight=1.0):
        """Add validated triple to the agent's graph"""
        subj, rel, obj = triple
        self.graph.add_edge(subj, obj, relation=rel, weight=edge_weight)
    
    def train_on_batch(self, triples_batch, edge_weights, false_flags):
        """Enhanced training with relation suggestions and prioritized pending relation resolution"""
        # Reset attempt counters for new batch cycle
        self.relation_attempts = {}
        
        batch_accuracy = 0
        total_decisions = 0
        
        # First, try to resolve any pending relations from previous batches using prioritization
        resolved_count = self.process_pending_relations()
        
        # === RELATION SUGGESTION PROCESSING ===
        # Generate and process relation suggestions if enabled
        if self.config.get('enable_relation_suggestions', False):
            suggestions = self.generate_relation_suggestions()
            if suggestions:
                suggestion_count = self.process_suggested_relations(suggestions)
                if self.config.get('verbose', False) and suggestion_count > 0:
                    print(f"  {self.agent_id}: Generated {len(suggestions)} suggestions, processed {suggestion_count}")
        
        # Process new triples
        for triple, weight, is_false in zip(triples_batch, edge_weights, false_flags):
            decision, score = self.validate_triple(triple, weight, is_false)
            
            # Add to graph if accepted immediately
            if decision == 'ACCEPT':
                self.add_triple_to_graph(triple, weight)
            
            # Track accuracy only for resolved decisions (ACCEPT/REJECT)
            if decision in ['ACCEPT', 'REJECT']:
                correct = (decision == 'REJECT') if is_false else (decision == 'ACCEPT')
                batch_accuracy += correct
                total_decisions += 1
            
            self.training_metrics['triples_processed'] += 1
            if is_false and decision == 'REJECT':
                self.training_metrics['false_triples_detected'] += 1
        
        # Update accuracy based on resolved decisions only
        if total_decisions > 0:
            self.training_metrics['accuracy'] = batch_accuracy / total_decisions
            self.quality_scores.append(self.training_metrics['accuracy'])
        else:
            # If no decisions were made this batch, maintain previous accuracy
            if self.quality_scores:
                self.quality_scores.append(self.quality_scores[-1])
            else:
                self.quality_scores.append(0.0)
                
        # Update suggestion metrics
        self.update_suggestion_metrics()
    
    def get_resolution_stats(self):
        """Enhanced statistics about relation resolution including prioritization metrics"""
        total_relations = len(self.relation_review_count)  # Use review count for total seen
        pending_count = len(self.pending_relations)
        epoch_locked_count = sum(1 for data in self.pending_relations.values() if data.get('epoch_locked', False))
        resolved_count = self.training_metrics['relations_resolved']
        forced_count = self.training_metrics['forced_decisions']
        top_weight_resolutions = self.training_metrics['top_weight_resolutions']
        
        # Calculate average resolution attempts per batch
        avg_attempts_per_batch = 0
        if self.training_metrics['resolution_attempts_per_batch']:
            avg_attempts_per_batch = np.mean(self.training_metrics['resolution_attempts_per_batch'])
        
        # Calculate priority weight distribution of pending relations
        if self.pending_relations:
            priority_weights = [data.get('priority_weight', 0.5) for data in self.pending_relations.values()]
            avg_priority_weight = np.mean(priority_weights)
            max_priority_weight = np.max(priority_weights)
            min_priority_weight = np.min(priority_weights)
        else:
            avg_priority_weight = 0.0
            max_priority_weight = 0.0
            min_priority_weight = 0.0
        
        return {
            'total_relations_seen': total_relations,
            'relations_resolved': resolved_count,
            'relations_pending': pending_count,
            'relations_epoch_locked': epoch_locked_count,
            'forced_decisions': forced_count,
            'top_weight_resolutions': top_weight_resolutions,
            'avg_attempts_per_batch': avg_attempts_per_batch,
            'avg_priority_weight': avg_priority_weight,
            'max_priority_weight': max_priority_weight,
            'min_priority_weight': min_priority_weight,
            'resolution_rate': resolved_count / total_relations if total_relations > 0 else 0,
            'pending_rate': pending_count / total_relations if total_relations > 0 else 0,
            'epoch_locked_rate': epoch_locked_count / total_relations if total_relations > 0 else 0,
            'forced_rate': forced_count / total_relations if total_relations > 0 else 0,
            'top_weight_rate': top_weight_resolutions / resolved_count if resolved_count > 0 else 0
        }
    
    def complete_epoch(self):
        """Enhanced epoch completion with suggestion tracking reset"""
        self.training_metrics['epochs_completed'] += 1
        
        # Store epoch-level metrics
        stats = self.get_graph_stats()
        resolution_stats = self.get_resolution_stats()
        
        self.training_metrics['epoch_accuracies'].append(self.training_metrics['accuracy'])
        self.training_metrics['epoch_nodes'].append(stats['nodes'])
        self.training_metrics['epoch_edges'].append(stats['edges'])
        
        # Log detailed epoch completion information
        if self.config.get('verbose', False):
            print(f"\n--- Agent {self.agent_id} Epoch {self.training_metrics['epochs_completed']} Complete ---")
            print(f"Accuracy: {self.training_metrics['accuracy']:.3f}")
            print(f"Graph: {stats['nodes']} nodes, {stats['edges']} edges")
            print(f"Resolution: {resolution_stats['relations_resolved']} resolved, {resolution_stats['relations_pending']} pending")
            print(f"Top-weight resolutions: {resolution_stats['top_weight_resolutions']}")
            print(f"Avg attempts per batch: {resolution_stats['avg_attempts_per_batch']:.1f}")
            
            # Log suggestion statistics
            if self.config.get('enable_relation_suggestions', False):
                print(f"Suggestions: {self.training_metrics['suggestions_generated']} generated, "
                      f"{self.training_metrics['suggestions_accepted']} accepted, "
                      f"{self.training_metrics['suggestions_rejected']} rejected")
                if self.training_metrics['suggestions_generated'] > 0:
                    print(f"Suggestion accuracy: {self.training_metrics['suggestion_accuracy']:.3f}")
                    print(f"Avg suggestion confidence: {self.training_metrics['avg_suggestion_confidence']:.3f}")
                    print(f"Suggestion diversity: {self.training_metrics['suggestion_diversity']} relation types")
            
            if resolution_stats['relations_pending'] > 0:
                print(f"Priority range: {resolution_stats['min_priority_weight']:.3f} - {resolution_stats['max_priority_weight']:.3f}")
        
        # CRITICAL FIX: Reset both attempt counts AND review counts for fresh epoch integration
        self.relation_attempts = {}
        self.relation_review_count = {}
        
        # Clear pending relations (fresh start for new epoch)
        self.pending_relations = {}
        
        # === RESET SUGGESTION TRACKING FOR NEW EPOCH ===
        self.suggested_relations = {}
        # Note: Keep suggestion_history for cross-epoch analysis, but reset per-epoch counters
        
        # Reset batch-level tracking for new epoch
        self.training_metrics['resolution_attempts_per_batch'] = []
        
    def get_graph_stats(self):
        """Get comprehensive graph statistics"""
        # Safely compute average clustering
        try:
            if self.graph.number_of_nodes() > 1 and self.graph.number_of_edges() > 0:
                avg_clustering = nx.average_clustering(self.graph.to_undirected())
            else:
                avg_clustering = 0.0
        except Exception:
            avg_clustering = 0.0
            
        return {
            'nodes': self.graph.number_of_nodes(),
            'edges': self.graph.number_of_edges(),
            'density': nx.density(self.graph),
            'avg_clustering': avg_clustering,
            'connected_components': nx.number_weakly_connected_components(self.graph),
            'avg_degree': np.mean([d for n, d in self.graph.degree()]) if self.graph.number_of_nodes() > 0 else 0
        }
    
    def get_epoch_summary(self):
        """Get summary of training across all epochs including suggestion metrics"""
        resolution_stats = self.get_resolution_stats()
        
        summary = {
            'agent_id': self.agent_id,
            'epochs_completed': self.training_metrics['epochs_completed'],
            'total_iterations': self.training_metrics['iterations_completed'],
            'total_triples_processed': self.training_metrics['triples_processed'],
            'epoch_accuracies': self.training_metrics['epoch_accuracies'],
            'epoch_nodes': self.training_metrics['epoch_nodes'],
            'epoch_edges': self.training_metrics['epoch_edges'],
            'final_accuracy': self.training_metrics['accuracy'],
            'false_triples_detected': self.training_metrics['false_triples_detected'],
            'decision_distribution': self.decision_log.copy(),
            'final_graph_stats': self.get_graph_stats(),
            'resolution_stats': resolution_stats
        }
        
        # Add suggestion metrics if enabled
        if self.config.get('enable_relation_suggestions', False):
            summary['suggestion_stats'] = {
                'suggestions_generated': self.training_metrics['suggestions_generated'],
                'suggestions_accepted': self.training_metrics['suggestions_accepted'],
                'suggestions_rejected': self.training_metrics['suggestions_rejected'],
                'suggestion_accuracy': self.training_metrics['suggestion_accuracy'],
                'avg_suggestion_confidence': self.training_metrics['avg_suggestion_confidence'],
                'suggestion_diversity': self.training_metrics['suggestion_diversity'],
                'total_suggestions_in_history': len(self.suggestion_history)
            }
            
        return summary

print("ValidationAgent class defined successfully")


ValidationAgent class defined successfully



# SECTION 3: FALSE TRIPLE GENERATOR CLASS


In [3]:

class FalseTripleGenerator:
    """Generate realistic false triples from real ConceptNet data"""
    
    def __init__(self, real_triples_df):
        self.real_triples = real_triples_df
        self.subjects = list(set(real_triples_df['subject'].values))
        self.relations = list(set(real_triples_df['relation'].values))
        self.objects = list(set(real_triples_df['object'].values))
        
        # Get edge weight distribution for realistic false weights
        if 'edge_weight' in real_triples_df.columns:
            self.weights = real_triples_df['edge_weight'].values
        else:
            self.weights = np.ones(len(real_triples_df))  # Default to 1.0
        
    def generate_false_triple(self):
        """Generate a false triple by mixing real components"""
        # Strategy 1: Random recombination (70%)
        if np.random.random() < 0.7:
            subj = np.random.choice(self.subjects)
            rel = np.random.choice(self.relations)
            obj = np.random.choice(self.objects)
            
            # Ensure it's not a real triple
            attempts = 0
            while self._is_real_triple(subj, rel, obj) and attempts < 10:
                obj = np.random.choice(self.objects)
                attempts += 1
                
        # Strategy 2: Semantic contradiction (20%)
        elif np.random.random() < 0.9:
            # Take a real triple and swap subject/object
            real_triple = self.real_triples.sample(1).iloc[0]
            subj = real_triple['object']
            rel = real_triple['relation']
            obj = real_triple['subject']
            
        # Strategy 3: Nonsensical relations (10%)
        else:
            real_triple = self.real_triples.sample(1).iloc[0]
            subj = real_triple['subject']
            rel = np.random.choice(self.relations)
            obj = real_triple['object']
            
            # Ensure different relation
            attempts = 0
            while rel == real_triple['relation'] and attempts < 10:
                rel = np.random.choice(self.relations)
                attempts += 1
        
        # Generate a false but realistic edge weight
        # False triples tend to have lower weights in practice
        false_weight = np.random.choice(self.weights) * np.random.uniform(0.3, 0.8)
        false_weight = max(0.1, min(1.0, false_weight))  # Clamp to valid range
                
        return (subj, rel, obj), false_weight
    
    def _is_real_triple(self, subj, rel, obj):
        """Check if a triple exists in real data"""
        return len(self.real_triples[
            (self.real_triples['subject'] == subj) &
            (self.real_triples['relation'] == rel) &
            (self.real_triples['object'] == obj)
        ]) > 0
    
    def inject_false_triples(self, real_batch, real_weights, false_ratio=0.15):
        """Inject false triples into a batch of real triples"""
        num_false = int(len(real_batch) * false_ratio)
        false_triples = []
        false_weights = []
        false_flags = [False] * len(real_batch)
        
        # Generate false triples
        for _ in range(num_false):
            false_triple, false_weight = self.generate_false_triple()
            false_triples.append(false_triple)
            false_weights.append(false_weight)
        
        # Combine and shuffle
        all_triples = list(real_batch) + false_triples
        all_weights = list(real_weights) + false_weights
        all_flags = false_flags + [True] * num_false
        
        # Shuffle together
        combined = list(zip(all_triples, all_weights, all_flags))
        np.random.shuffle(combined)
        triples, weights, flags = zip(*combined)
        
        return list(triples), list(weights), list(flags)

print("FalseTripleGenerator class defined successfully")


FalseTripleGenerator class defined successfully



# SECTION 4: DATA LOADING AND PREPROCESSING


In [4]:

print("Loading ConceptNet data...")

try:
    # Try loading preprocessed parquet file first
    conceptnet_file = DATA_PATH / 'conceptnet_en_processed_for_graph.parquet.gzip'
    if conceptnet_file.exists():
        df_raw = pd.read_parquet(conceptnet_file)
        print(f"Loaded preprocessed data: {len(df_raw)} triples")
        
        # Map columns to expected format: relation_type, start_concept, end_concept, edge_weight
        if 'relation_type' in df_raw.columns and 'start_concept' in df_raw.columns:
            df = df_raw.rename(columns={
                'start_concept': 'subject',
                'relation_type': 'relation', 
                'end_concept': 'object'
            }).copy()
            print("Column mapping applied: start_concept -> subject, relation_type -> relation, end_concept -> object")
        else:
            # Fallback column mapping if different structure
            df = df_raw.copy()
            if df.shape[1] >= 3:
                df.columns = ['subject', 'relation', 'object'] + list(df.columns[3:])
    else:
        # Fallback to CSV
        conceptnet_file = DATA_PATH / 'conceptnet_en_triples.csv'
        df_raw = pd.read_csv(conceptnet_file)
        print(f"Loaded CSV data: {len(df_raw)} triples")
        
        # Map columns to expected format
        if 'relation_type' in df_raw.columns and 'start_concept' in df_raw.columns:
            df = df_raw.rename(columns={
                'start_concept': 'subject',
                'relation_type': 'relation',
                'end_concept': 'object'
            }).copy()
        else:
            df = df_raw.copy()
            if 'subject' not in df.columns:
                # Assume first three columns are subject, relation, object
                df.columns = ['subject', 'relation', 'object'] + list(df.columns[3:])
        
except Exception as e:
    print(f"Error loading data: {e}")
    # Create sample data for testing using realistic ConceptNet relations
    print("Creating sample data for testing...")
    df = pd.DataFrame({
        'subject': ['cat', 'dog', 'bird', 'fish', 'tree', 'happy', 'run', 'blue'] * 625,
        'relation': ['IsA', 'HasProperty', 'RelatedTo', 'CapableOf', 'AtLocation', 'FormOf', 'DerivedFrom', 'Synonym'] * 625,
        'object': ['animal', 'pet', 'living_thing', 'water', 'forest', 'emotion', 'move', 'color'] * 625,
        'edge_weight': np.random.choice([0.5, 0.7, 1.0], size=5000, p=[0.1, 0.2, 0.7])  # Realistic weight distribution
    })

# Sample data for experiment

def ensure_all_nodes_connected(df, sample_size):
    """
    Ensure every subject and object in the sample is connected to at least one other node.
    """
    # Start with a random sample
    if len(df) <= sample_size:
        sample = df.copy()
    else:
        sample = df.sample(n=sample_size, random_state=42)

    # Build undirected graph for connectivity check
    G = nx.from_pandas_edgelist(sample, 'subject', 'object', create_using=nx.Graph())

    # Find isolated nodes (degree 0)
    all_nodes = set(sample['subject']).union(set(sample['object']))
    node_degrees = dict(G.degree(all_nodes))
    isolated_nodes = [node for node, deg in node_degrees.items() if deg == 0]

    # For each isolated node, try to add a triple from df that connects it to the sample
    for node in isolated_nodes:
        # Find triples in df (not already in sample) where node is subject or object
        candidates = df[
            ((df['subject'] == node) | (df['object'] == node)) &
            ~df.index.isin(sample.index)
        ]
        # Prefer triples that connect to an already-included node
        candidates = candidates[
            (candidates['subject'].isin(all_nodes)) | (candidates['object'].isin(all_nodes))
        ]
        if not candidates.empty:
            # Add the first candidate triple
            sample = pd.concat([sample, candidates.iloc[[0]]], ignore_index=True)
            # Update graph and node set
            G.add_edge(candidates.iloc[0]['subject'], candidates.iloc[0]['object'])
            all_nodes.add(candidates.iloc[0]['subject'])
            all_nodes.add(candidates.iloc[0]['object'])

    # Remove any remaining isolated nodes (if no connecting triple exists)
    G = nx.from_pandas_edgelist(sample, 'subject', 'object', create_using=nx.Graph())
    node_degrees = dict(G.degree(all_nodes))
    still_isolated = [node for node, deg in node_degrees.items() if deg == 0]
    if still_isolated:
        sample = sample[
            ~sample['subject'].isin(still_isolated) &
            ~sample['object'].isin(still_isolated)
        ].reset_index(drop=True)

    # If sample is now larger than sample_size, downsample
    if len(sample) > sample_size:
        sample = sample.sample(n=sample_size, random_state=42).reset_index(drop=True)

    return sample

if len(df) > CONFIG['sample_size']:
    df_sample = ensure_all_nodes_connected(df, CONFIG['sample_size'])
else:
    df_sample = df.copy()

print(f"Using {len(df_sample)} triples for experiment")
print(f"Unique subjects: {df_sample['subject'].nunique()}")
print(f"Unique relations: {df_sample['relation'].nunique()}")
print(f"Unique objects: {df_sample['object'].nunique()}")

# Show edge weight distribution if available
if 'edge_weight' in df_sample.columns:
    print(f"Edge weight distribution:")
    print(f"  Mean: {df_sample['edge_weight'].mean():.3f}")
    print(f"  Range: [{df_sample['edge_weight'].min():.1f}, {df_sample['edge_weight'].max():.1f}]")
    weight_counts = df_sample['edge_weight'].value_counts().head()
    print(f"  Top weights: {dict(weight_counts)}")

# Create false triple generator
false_generator = FalseTripleGenerator(df_sample)
print("False triple generator initialized")


Loading ConceptNet data...
Loaded preprocessed data: 1655522 triples
Column mapping applied: start_concept -> subject, relation_type -> relation, end_concept -> object
Using 50000 triples for experiment
Unique subjects: 21409
Unique relations: 46
Unique objects: 38234
Edge weight distribution:
  Mean: 0.941
  Range: [0.1, 11.6]
  Top weights: {1.0: np.int64(39379), 0.5: np.int64(2714), 0.25: np.int64(1415), 2.0: np.int64(1228), 2.828: np.int64(198)}
False triple generator initialized


In [5]:
df_sample

Unnamed: 0,relation,subject,object,edge_weight
728667,FormOf,v,swinck,1.0
366573,DerivedFrom,superviolent,violent,1.0
904532,IsA,purple_sand_tilefish,n,1.0
566489,FormOf,n,febricity,1.0
893824,IsA,international_unit_of_measure,n,1.0
...,...,...,...,...
1391575,RelatedTo,en_2,escapade,1.0
668416,FormOf,n,pigeonhole_principle,1.0
1209303,RelatedTo,n,hebetate,2.0
725880,FormOf,n,sulkiness,1.0



# SECTION 5: AGENT INITIALIZATION


In [None]:

print(f"\nInitializing {CONFIG['num_agents']} validation agents...")

agents = []
for i in range(CONFIG['num_agents']):
    # Assign a unique random seed to each agent
    agent_config = CONFIG.copy()
    agent_config['agent_seed'] = int(time.time() * 1000) % (2**32) + i * 1000
    agents.append(ValidationAgent(f"Agent_{i+1}", agent_config))
    print(f"  Agent {i+1} initialized with seed {agent_config['agent_seed']}")

print(f"\nAll {len(agents)} agents ready for training")
print("Agent configurations:")
for agent in agents:
    print(f"  {agent.agent_id}: Graph nodes={agent.graph.number_of_nodes()}, edges={agent.graph.number_of_edges()}")

# Calculate all unique subjects and objects in the dataset
all_subjects = set(df_sample['subject'].unique())
all_objects = set(df_sample['object'].unique())
all_entities = all_subjects | all_objects
print(f"\nTotal unique subjects and objects: {len(all_entities):,}")



# SECTION 6: ADAPTIVE TRAINING SYSTEM


In [None]:

def get_entity_coverage(agent_graph, all_entities):
    """Calculate entity coverage for an agent"""
    agent_nodes = set(agent_graph.nodes())
    return len(agent_nodes & all_entities) / len(all_entities)

def adaptive_training_system(agents, df_sample, false_generator, all_entities, config):
    """
    Intelligent training system that tries coverage-based training first,
    then falls back to epoch-based training if needed.
    """
    print("\n" + "="*80)
    print("🚀 STARTING ADAPTIVE MULTI-AGENT TRAINING SYSTEM")
    print("="*80)
    
    training_history = []
    training_start_time = time.time()
    training_mode = "DETERMINING"
    
    # Configuration
    target_coverage = config['target_coverage']
    timeout_minutes = config['coverage_timeout_minutes']
    max_coverage_iterations = config['coverage_max_iterations']
    check_freq = config['coverage_check_frequency']
    progress_threshold = config['coverage_progress_threshold']
    
    print(f"🎯 Target Coverage: {target_coverage*100:.1f}%")
    print(f"⏰ Coverage Timeout: {timeout_minutes} minutes")
    print(f"🔄 Max Coverage Iterations: {max_coverage_iterations}")
    print(f"📊 Adaptive Mode: {'ENABLED' if config['adaptive_training_mode'] else 'DISABLED'}")
    
    # === PHASE 1: TRY COVERAGE-BASED TRAINING ===
    if config['adaptive_training_mode'] and config['coverage_based_training']:
        print(f"\n🎯 PHASE 1: ATTEMPTING COVERAGE-BASED CONTINUOUS TRAINING")
        print("-" * 60)
        
        coverage_reached = [False] * len(agents)
        coverage_start_time = time.time()
        timeout_time = coverage_start_time + (timeout_minutes * 60)
        
        # Track coverage progress for stagnation detection
        coverage_history = {agent.agent_id: [] for agent in agents}
        last_progress_check = 0
        
        for iteration in range(max_coverage_iterations):
            # Check timeout
            if time.time() > timeout_time:
                print(f"⏰ TIMEOUT: Coverage-based training exceeded {timeout_minutes} minutes")
                break
            
            print(f"\n📍 Coverage Iteration {iteration + 1}/{max_coverage_iterations}")
            print("-" * 40)
            
            # Train each agent
            iteration_results = []
            for agent in agents:
                # Unique seed for reproducibility
                np.random.seed(agent.config.get('agent_seed', 42) + iteration * 1000)
                random.seed(agent.config.get('agent_seed', 42) + iteration * 1000)
                
                # Sample batch
                batch_data = df_sample.sample(n=min(config['batch_size'], len(df_sample)), replace=True)
                real_triples = [(row['subject'], row['relation'], row['object']) 
                               for _, row in batch_data.iterrows()]
                real_weights = batch_data.get('edge_weight', pd.Series([1.0] * len(batch_data))).values
                
                # Inject false triples
                mixed_triples, mixed_weights, truth_flags = false_generator.inject_false_triples(
                    real_triples, real_weights, config['false_triple_ratio']
                )
                
                # Train agent
                agent.train_on_batch(mixed_triples, mixed_weights, truth_flags)
                agent.training_metrics['iterations_completed'] = iteration + 1
                stats = agent.get_graph_stats()
                accuracy = agent.training_metrics['accuracy']
                
                iteration_results.append({
                    'agent_id': agent.agent_id,
                    'mode': 'COVERAGE_BASED',
                    'iteration': iteration + 1,
                    'accuracy': accuracy,
                    'graph_nodes': stats['nodes'],
                    'graph_edges': stats['edges'],
                    'graph_density': stats['density'],
                    'decisions': agent.decision_log.copy()
                })
                
                if config['verbose'] and iteration % 5 == 0:
                    print(f"    {agent.agent_id}: Acc={accuracy:.3f}, Nodes={stats['nodes']}, Edges={stats['edges']}")
            
            # Check coverage every N iterations
            if (iteration + 1) % check_freq == 0:
                print(f"\n📊 COVERAGE CHECK (Iteration {iteration + 1}):")
                current_coverages = []
                
                for idx, agent in enumerate(agents):
                    coverage = get_entity_coverage(agent.graph, all_entities)
                    coverage_history[agent.agent_id].append(coverage)
                    current_coverages.append(coverage)
                    
                    if not coverage_reached[idx] and coverage >= target_coverage:
                        print(f"🎉 {agent.agent_id} reached {coverage*100:.1f}% coverage!")
                        coverage_reached[idx] = True
                    else:
                        print(f"   {agent.agent_id}: {coverage*100:.1f}% coverage")
                
                # Check if all agents reached target
                if all(coverage_reached):
                    elapsed_time = time.time() - coverage_start_time
                    print(f"\n🏆 SUCCESS! All agents reached {target_coverage*100:.1f}% coverage in {elapsed_time/60:.1f} minutes")
                    training_mode = "COVERAGE_SUCCESS"
                    training_history.extend(iteration_results)
                    break
                
                # Check for stagnation (no progress in coverage)
                if iteration > 50 and (iteration - last_progress_check) >= 50:
                    progress_detected = False
                    for agent_id, hist in coverage_history.items():
                        if len(hist) >= 2:
                            recent_progress = hist[-1] - hist[-6] if len(hist) >= 6 else hist[-1] - hist[0]
                            if recent_progress >= progress_threshold:
                                progress_detected = True
                                break
                    
                    if not progress_detected:
                        avg_coverage = np.mean(current_coverages)
                        print(f"⚠️  STAGNATION: No significant progress detected (avg coverage: {avg_coverage*100:.1f}%)")
                        if avg_coverage < 0.5:  # Very low coverage
                            print("💔 Coverage-based training appears ineffective - will fallback to epochs")
                            break
                    
                    last_progress_check = iteration
            
            training_history.extend(iteration_results)
        
        # Assess coverage-based training results
        if not all(coverage_reached):
            final_coverages = [get_entity_coverage(agent.graph, all_entities) for agent in agents]
            avg_coverage = np.mean(final_coverages)
            elapsed_time = time.time() - coverage_start_time
            
            print(f"\n📊 COVERAGE-BASED TRAINING RESULTS:")
            print(f"   Average Coverage: {avg_coverage*100:.1f}%")
            print(f"   Agents at Target: {sum(coverage_reached)}/{len(agents)}")
            print(f"   Time Elapsed: {elapsed_time/60:.1f} minutes")
            
            if not config['fallback_to_epochs']:
                print("🛑 Fallback disabled - stopping training")
                training_mode = "COVERAGE_INCOMPLETE"
            else:
                print("🔄 Proceeding to epoch-based training fallback")
                training_mode = "FALLBACK_TO_EPOCHS"
        else:
            training_mode = "COVERAGE_SUCCESS"
    else:
        print(f"⏩ SKIPPING coverage-based training (disabled in config)")
        training_mode = "EPOCH_BASED_ONLY"
    
    # === PHASE 2: EPOCH-BASED TRAINING (FALLBACK OR PRIMARY) ===
    if training_mode in ["FALLBACK_TO_EPOCHS", "EPOCH_BASED_ONLY"]:
        print(f"\n🏛️ PHASE 2: EPOCH-BASED TRAINING")
        print("-" * 60)
        
        if training_mode == "FALLBACK_TO_EPOCHS":
            print("🔄 Coverage-based training incomplete - using epoch-based as fallback")
        else:
            print("📚 Using traditional epoch-based training as primary method")
        
        # Traditional epoch-based training
        max_epochs = config.get('num_epochs', 5)
        max_iterations = config.get('max_iterations', 150)
        batch_size = config['batch_size']
        
        for epoch in range(max_epochs):
            print(f"\n📖 EPOCH {epoch+1}/{max_epochs}")
            print("=" * 30)
            
            for iteration in range(max_iterations):
                if config['epoch_verbose'] and iteration % 20 == 0:
                    print(f"   Iteration {iteration + 1}/{max_iterations}")
                    
                    # Show resolution summary every 50 iterations  
                    if iteration % 50 == 0:
                        print("   Resolution Status:")
                        for agent in agents:
                            res_stats = agent.get_resolution_stats()
                            print(f"     {agent.agent_id}: {res_stats['resolution_rate']:.1%} resolved, " +
                                  f"{res_stats['pending_rate']:.1%} pending, " +
                                  f"{res_stats['forced_rate']:.1%} forced")
                
                # Each agent trains independently
                iteration_results = []
                for agent in agents:
                    # Unique seed
                    np.random.seed(agent.config.get('agent_seed', 42) + epoch * 1000 + iteration)
                    random.seed(agent.config.get('agent_seed', 42) + epoch * 1000 + iteration)
                    
                    # Sample and train
                    batch_data = df_sample.sample(n=min(batch_size, len(df_sample)), replace=True)
                    real_triples = [(row['subject'], row['relation'], row['object']) 
                                   for _, row in batch_data.iterrows()]
                    real_weights = batch_data.get('edge_weight', pd.Series([1.0] * len(batch_data))).values
                    
                    mixed_triples, mixed_weights, truth_flags = false_generator.inject_false_triples(
                        real_triples, real_weights, config['false_triple_ratio']
                    )
                    
                    agent.train_on_batch(mixed_triples, mixed_weights, truth_flags)
                    agent.training_metrics['iterations_completed'] = iteration + 1
                    stats = agent.get_graph_stats()
                    accuracy = agent.training_metrics['accuracy']
                    
                    iteration_results.append({
                        'agent_id': agent.agent_id,
                        'mode': 'EPOCH_BASED',
                        'epoch': epoch + 1,
                        'iteration': iteration + 1,
                        'accuracy': accuracy,
                        'graph_nodes': stats['nodes'],
                        'graph_edges': stats['edges'],
                        'graph_density': stats['density'],
                        'decisions': agent.decision_log.copy()
                    })
                
                training_history.extend(iteration_results)
                
                # Verbose output
                if config['verbose'] and iteration % 30 == 0:
                    for agent in agents:
                        stats = agent.get_graph_stats()
                        accuracy = agent.training_metrics['accuracy']
                        resolution_stats = agent.get_resolution_stats()
                        print(f"      {agent.agent_id}: Acc={accuracy:.3f}, Nodes={stats['nodes']}, Edges={stats['edges']}")
                        print(f"        Resolved={resolution_stats['relations_resolved']}, " +
                              f"Pending={resolution_stats['relations_pending']}, " +
                              f"Forced={resolution_stats['forced_decisions']}, " +
                              f"Resolution Rate={resolution_stats['resolution_rate']:.2f}")
            
            # Complete epoch for all agents
            for agent in agents:
                agent.complete_epoch()
        
        training_mode = "EPOCH_BASED_COMPLETE"
    
    # === FINAL RESOLUTION PASS ===
    print(f"\n🔍 FINAL RESOLUTION PASS")
    print("-" * 40)
    
    total_pending_before = sum(len(agent.pending_relations) for agent in agents)
    print(f"Total pending relations before final pass: {total_pending_before}")
    
    if total_pending_before > 0:
        print("Forcing resolution of all pending relations...")
        
        for agent in agents:
            pending_count = len(agent.pending_relations)
            if pending_count > 0:
                print(f"  {agent.agent_id}: Resolving {pending_count} pending relations")
                
                # Force resolution of all pending relations
                pending_copy = list(agent.pending_relations.items())
                for triple_key, relation_data in pending_copy:
                    triple = relation_data['triple']
                    edge_weight = relation_data['edge_weight']
                    is_false = relation_data['is_false']
                    
                    # Force decision on this relation
                    decision, score = agent.validate_triple(triple, edge_weight, is_false, force_decision=True)
                    
                    if decision == 'ACCEPT':
                        agent.add_triple_to_graph(triple, edge_weight)
        
        total_pending_after = sum(len(agent.pending_relations) for agent in agents)
        print(f"Total pending relations after final pass: {total_pending_after}")
        
        if total_pending_after == 0:
            print("✅ All relations successfully resolved!")
        else:
            print(f"⚠️  {total_pending_after} relations still pending (should be 0)")
    
    # === TRAINING SUMMARY ===
    total_time = time.time() - training_start_time
    print(f"\n🎉 ADAPTIVE TRAINING COMPLETE!")
    print("=" * 50)
    print(f"🎯 Final Training Mode: {training_mode}")
    print(f"⏱️  Total Training Time: {total_time/60:.1f} minutes")
    print(f"📊 Total Training Records: {len(training_history)}")
    
    # Final coverage check
    print(f"\n📈 FINAL COVERAGE ANALYSIS:")
    for agent in agents:
        coverage = get_entity_coverage(agent.graph, all_entities)
        stats = agent.get_graph_stats()
        print(f"   {agent.agent_id}: {coverage*100:.1f}% coverage, {stats['nodes']} nodes, {stats['edges']} edges")
    
    return training_history, training_mode



# SECTION 7: EXECUTE TRAINING


In [None]:

print("🎯 Initializing Adaptive Training System...")
required_coverage = CONFIG['target_coverage']  # Use config value

training_history, final_training_mode = adaptive_training_system(
    agents=agents,
    df_sample=df_sample, 
    false_generator=false_generator,
    all_entities=all_entities,
    config=CONFIG
)

# SECTION 8: SIMILARITY ANALYSIS FUNCTIONS

In [None]:

def calculate_jaccard_similarity(graph1, graph2):
    """Calculate Jaccard similarity between two graphs"""
    edges1 = set(graph1.edges())
    edges2 = set(graph2.edges())
    
    intersection = len(edges1.intersection(edges2))
    union = len(edges1.union(edges2))
    
    return intersection / union if union > 0 else 0

def calculate_weighted_jaccard_similarity(graph1, graph2):
    """Calculate weighted Jaccard similarity considering edge weights"""
    # Get all edges from both graphs
    all_edges = set(graph1.edges()) | set(graph2.edges())
    
    if not all_edges:
        return 0
    
    intersection_weight = 0
    union_weight = 0
    
    for edge in all_edges:
        weight1 = graph1[edge[0]][edge[1]].get('weight', 0) if graph1.has_edge(*edge) else 0
        weight2 = graph2[edge[0]][edge[1]].get('weight', 0) if graph2.has_edge(*edge) else 0
        
        intersection_weight += min(weight1, weight2)
        union_weight += max(weight1, weight2)
    
    return intersection_weight / union_weight if union_weight > 0 else 0

def calculate_node_overlap(graph1, graph2):
    """Calculate node overlap between two graphs"""
    nodes1 = set(graph1.nodes())
    nodes2 = set(graph2.nodes())
    
    intersection = len(nodes1.intersection(nodes2))
    union = len(nodes1.union(nodes2))
    
    return intersection / union if union > 0 else 0

def calculate_structural_similarity(graph1, graph2):
    """Calculate structural similarity based on graph properties"""
    stats1 = {
        'nodes': graph1.number_of_nodes(),
        'edges': graph1.number_of_edges(),
        'density': nx.density(graph1),
        'avg_clustering': nx.average_clustering(graph1.to_undirected()) if graph1.number_of_nodes() > 0 else 0
    }
    
    stats2 = {
        'nodes': graph2.number_of_nodes(),
        'edges': graph2.number_of_edges(),
        'density': nx.density(graph2),
        'avg_clustering': nx.average_clustering(graph2.to_undirected()) if graph2.number_of_nodes() > 0 else 0
    }
    
    # Calculate normalized differences
    similarities = []
    for key in stats1.keys():
        if stats1[key] + stats2[key] > 0:
            sim = 1 - abs(stats1[key] - stats2[key]) / (stats1[key] + stats2[key])
            similarities.append(sim)
    
    return np.mean(similarities) if similarities else 0

def calculate_semantic_similarity(graph1, graph2):
    """Calculate semantic similarity based on relation types"""
    # Get relation distributions
    relations1 = [data['relation'] for _, _, data in graph1.edges(data=True) if 'relation' in data]
    relations2 = [data['relation'] for _, _, data in graph2.edges(data=True) if 'relation' in data]
    
    if not relations1 or not relations2:
        return 0
    
    # Create relation frequency vectors
    all_relations = list(set(relations1 + relations2))
    
    freq1 = np.array([relations1.count(rel) for rel in all_relations])
    freq2 = np.array([relations2.count(rel) for rel in all_relations])
    
    # Normalize
    freq1 = freq1 / np.sum(freq1) if np.sum(freq1) > 0 else freq1
    freq2 = freq2 / np.sum(freq2) if np.sum(freq2) > 0 else freq2
    
    # Calculate cosine similarity
    dot_product = np.dot(freq1, freq2)
    norms = np.linalg.norm(freq1) * np.linalg.norm(freq2)
    
    return dot_product / norms if norms > 0 else 0

def calculate_path_similarity(graph1, graph2, sample_size=100):
    """Calculate similarity based on shortest paths between common nodes"""
    common_nodes = list(set(graph1.nodes()).intersection(set(graph2.nodes())))
    
    if len(common_nodes) < 2:
        return 0
    
    # Sample node pairs
    sample_pairs = [(common_nodes[i], common_nodes[j]) 
                   for i in range(min(sample_size, len(common_nodes))) 
                   for j in range(i+1, min(sample_size, len(common_nodes)))]
    
    path_similarities = []
    
    for source, target in sample_pairs[:sample_size]:
        try:
            path1 = nx.shortest_path_length(graph1, source, target)
            path2 = nx.shortest_path_length(graph2, source, target)
            
            # Similarity based on path length difference
            max_path = max(path1, path2)
            similarity = 1 - abs(path1 - path2) / max_path if max_path > 0 else 1
            path_similarities.append(similarity)
            
        except nx.NetworkXNoPath:
            # One or both graphs don't have a path
            path_similarities.append(0)
    
    return np.mean(path_similarities) if path_similarities else 0

print("Graph similarity functions defined successfully")

# SECTION 9: CALCULATE SIMILARITY MATRICES

In [None]:

print("\n" + "="*60)
print("CALCULATING AGENT SIMILARITY MATRIX")
print("="*60)

num_agents = len(agents)
similarity_results = {
    'jaccard': np.zeros((num_agents, num_agents)),
    'weighted_jaccard': np.zeros((num_agents, num_agents)),
    'node_overlap': np.zeros((num_agents, num_agents)),
    'structural': np.zeros((num_agents, num_agents)),
    'semantic': np.zeros((num_agents, num_agents)),
    'path_based': np.zeros((num_agents, num_agents))
}

print(f"Comparing {num_agents} agents pairwise...")

for i in range(num_agents):
    for j in range(num_agents):
        if i == j:
            # Self-similarity is 1.0
            for metric in similarity_results.keys():
                similarity_results[metric][i][j] = 1.0
        else:
            graph1 = agents[i].graph
            graph2 = agents[j].graph
            
            # Calculate all similarity metrics
            similarity_results['jaccard'][i][j] = calculate_jaccard_similarity(graph1, graph2)
            similarity_results['weighted_jaccard'][i][j] = calculate_weighted_jaccard_similarity(graph1, graph2)
            similarity_results['node_overlap'][i][j] = calculate_node_overlap(graph1, graph2)
            similarity_results['structural'][i][j] = calculate_structural_similarity(graph1, graph2)
            similarity_results['semantic'][i][j] = calculate_semantic_similarity(graph1, graph2)
            similarity_results['path_based'][i][j] = calculate_path_similarity(graph1, graph2)

print("Similarity calculations completed!")

# Calculate summary statistics
summary_stats = {}
for metric, matrix in similarity_results.items():
    # Get upper triangle (excluding diagonal)
    upper_triangle = matrix[np.triu_indices_from(matrix, k=1)]
    
    summary_stats[metric] = {
        'mean': np.mean(upper_triangle),
        'std': np.std(upper_triangle),
        'min': np.min(upper_triangle),
        'max': np.max(upper_triangle),
        'median': np.median(upper_triangle)
    }

print("\nSimilarity Summary Statistics:")
for metric, stats in summary_stats.items():
    print(f"\n{metric.upper()} Similarity:")
    print(f"  Mean: {stats['mean']:.4f}")
    print(f"  Std:  {stats['std']:.4f}")
    print(f"  Range: [{stats['min']:.4f}, {stats['max']:.4f}]")

# SECTION 10: AGENT PERFORMANCE ANALYSIS

In [None]:

print("\n" + "="*60)
print("AGENT PERFORMANCE ANALYSIS")
print("="*60)

# Collect final agent statistics
agent_performance = []

for agent in agents:
    stats = agent.get_graph_stats()
    metrics = agent.training_metrics
    
    # Calculate additional metrics
    total_decisions = sum(agent.decision_log.values())
    accept_rate = agent.decision_log['ACCEPT'] / total_decisions if total_decisions > 0 else 0
    reject_rate = agent.decision_log['REJECT'] / total_decisions if total_decisions > 0 else 0
    review_rate = agent.decision_log['REVIEW'] / total_decisions if total_decisions > 0 else 0
    
    # False positive rate (accepting false triples)
    false_acceptances = sum(1 for v in agent.validation_history 
                           if v['is_false'] and v['decision'] == 'ACCEPT')
    false_positives = len([v for v in agent.validation_history if v['is_false']])
    false_positive_rate = false_acceptances / false_positives if false_positives > 0 else 0
    
    # Get resolution statistics
    resolution_stats = agent.get_resolution_stats()
    
    performance = {
        'agent_id': agent.agent_id,
        'final_accuracy': metrics['accuracy'],
        'graph_nodes': stats['nodes'],
        'graph_edges': stats['edges'],
        'graph_density': stats['density'],
        'avg_clustering': stats['avg_clustering'],
        'connected_components': stats['connected_components'],
        'triples_processed': metrics['triples_processed'],
        'false_triples_detected': metrics['false_triples_detected'],
        'accept_rate': accept_rate,
        'reject_rate': reject_rate,
        'review_rate': review_rate,
        'false_positive_rate': false_positive_rate,
        'avg_degree': stats['avg_degree'],
        'resolution_rate': resolution_stats['resolution_rate'],
        'forced_decisions': resolution_stats['forced_decisions'],
        'pending_relations': resolution_stats['relations_pending']
    }
    
    agent_performance.append(performance)

# Convert to DataFrame for analysis
perf_df = pd.DataFrame(agent_performance)

print("\nAgent Performance Summary:")
print(perf_df[['agent_id', 'final_accuracy', 'graph_nodes', 'graph_edges', 
               'accept_rate', 'reject_rate', 'false_positive_rate', 
               'resolution_rate', 'forced_decisions']].to_string(index=False))

# Calculate performance statistics
print("\n\nPerformance Statistics Across All Agents:")
print(f"Average Accuracy: {perf_df['final_accuracy'].mean():.4f} ± {perf_df['final_accuracy'].std():.4f}")
print(f"Average Graph Size: {perf_df['graph_nodes'].mean():.1f} nodes, {perf_df['graph_edges'].mean():.1f} edges")
print(f"Average Accept Rate: {perf_df['accept_rate'].mean():.4f} ± {perf_df['accept_rate'].std():.4f}")
print(f"Average False Positive Rate: {perf_df['false_positive_rate'].mean():.4f} ± {perf_df['false_positive_rate'].std():.4f}")
print(f"Average Resolution Rate: {perf_df['resolution_rate'].mean():.4f} ± {perf_df['resolution_rate'].std():.4f}")
print(f"Total Forced Decisions: {perf_df['forced_decisions'].sum()}")
print(f"Remaining Pending Relations: {perf_df['pending_relations'].sum()}")

# SECTION 11: STATISTICAL ANALYSIS

In [None]:

print("\n" + "="*60)
print("STATISTICAL ANALYSIS & HYPOTHESIS TESTING")
print("="*60)

# 1. Test for Universal Semantic Structure
print("\n1. TESTING FOR UNIVERSAL SEMANTIC STRUCTURE")
print("-" * 50)

# Calculate overall similarity scores
overall_similarities = []
for metric, matrix in similarity_results.items():
    upper_triangle = matrix[np.triu_indices_from(matrix, k=1)]
    mean_sim = np.mean(upper_triangle)
    overall_similarities.append(mean_sim)
    print(f"{metric.upper()} - Mean similarity: {mean_sim:.4f}")

# Overall consistency score
consistency_score = np.mean(overall_similarities)
print(f"\nOVERALL CONSISTENCY SCORE: {consistency_score:.4f}")

# Interpretation
if consistency_score > 0.7:
    interpretation = "STRONG evidence for universal semantic structure"
elif consistency_score > 0.5:
    interpretation = "MODERATE evidence for universal semantic structure"
else:
    interpretation = "WEAK evidence for universal semantic structure"

print(f"INTERPRETATION: {interpretation}")

# 2. Statistical Tests
print("\n\n2. STATISTICAL SIGNIFICANCE TESTS")
print("-" * 50)

# Test if similarities are significantly above random
from scipy.stats import ttest_1samp

random_baseline = 0.2  # Expected similarity for random graphs

for metric, matrix in similarity_results.items():
    upper_triangle = matrix[np.triu_indices_from(matrix, k=1)]
    t_stat, p_value = ttest_1samp(upper_triangle, random_baseline)
    
    print(f"\n{metric.upper()} vs Random Baseline:")
    print(f"  T-statistic: {t_stat:.4f}")
    print(f"  P-value: {p_value:.6f}")
    print(f"  Significant: {'Yes' if p_value < 0.05 else 'No'}")

# 3. Robustness Analysis
print("\n\n3. ROBUSTNESS TO FALSE TRIPLES")
print("-" * 50)

# Analyze correlation between false positive rate and graph similarity
# Calculate average similarity for each agent
agent_avg_similarities = []
for i in range(num_agents):
    similarities = []
    for j in range(num_agents):
        if i != j:
            # Average across all metrics
            avg_sim = np.mean([similarity_results[metric][i][j] 
                              for metric in similarity_results.keys()])
            similarities.append(avg_sim)
    agent_avg_similarities.append(np.mean(similarities))

# Correlation with false positive rate
corr_coef, p_value = pearsonr(perf_df['false_positive_rate'], agent_avg_similarities)
print(f"Correlation between False Positive Rate and Average Similarity:")
print(f"  Pearson correlation: {corr_coef:.4f}")
print(f"  P-value: {p_value:.6f}")
print(f"  Interpretation: {'Robust' if abs(corr_coef) < 0.3 else 'Sensitive'} to false triples")

# 4. Convergence Analysis
print("\n\n4. TRAINING CONVERGENCE ANALYSIS")
print("-" * 50)

# Analyze if agents converged to similar solutions
final_accuracies = perf_df['final_accuracy'].values
accuracy_variance = np.var(final_accuracies)
accuracy_cv = np.std(final_accuracies) / np.mean(final_accuracies)  # Coefficient of variation

print(f"Final Accuracy Statistics:")
print(f"  Mean: {np.mean(final_accuracies):.4f}")
print(f"  Variance: {accuracy_variance:.6f}")
print(f"  Coefficient of Variation: {accuracy_cv:.4f}")
print(f"  Convergence: {'High' if accuracy_cv < 0.1 else 'Moderate' if accuracy_cv < 0.2 else 'Low'}")

# 5. Graph Structure Consistency
print("\n\n5. GRAPH STRUCTURE CONSISTENCY")
print("-" * 50)

# Analyze consistency in graph properties
structural_properties = ['graph_density', 'avg_clustering', 'avg_degree']
structural_consistency = {}

for prop in structural_properties:
    values = perf_df[prop].values
    cv = np.std(values) / np.mean(values) if np.mean(values) > 0 else 0
    structural_consistency[prop] = cv
    print(f"{prop.replace('_', ' ').title()}:")
    print(f"  Mean: {np.mean(values):.4f}")
    print(f"  CV: {cv:.4f}")
    print(f"  Consistency: {'High' if cv < 0.2 else 'Moderate' if cv < 0.5 else 'Low'}")
    print()

# Overall structural consistency
overall_structural_consistency = np.mean(list(structural_consistency.values()))
print(f"Overall Structural Consistency (lower is better): {overall_structural_consistency:.4f}")

print("\nStatistical analysis completed!")

# SECTION 12: ENHANCED THEORY VALIDATION METRICS

In [None]:

print("\n" + "="*60)
print("ENHANCED THEORY VALIDATION METRICS")
print("="*60)

from collections import Counter

def structural_isomorphism_index(agents):
    """Calculate degree sequence similarity - core evidence for structural isomorphism"""
    iso_scores = []
    
    for i in range(len(agents)):
        for j in range(i+1, len(agents)):
            # Get degree sequences
            deg_seq1 = sorted([d for n, d in agents[i].graph.degree()], reverse=True)
            deg_seq2 = sorted([d for n, d in agents[j].graph.degree()], reverse=True)
            
            if not deg_seq1 or not deg_seq2:
                continue
                
            # Pad sequences to same length
            max_len = max(len(deg_seq1), len(deg_seq2))
            deg_seq1.extend([0] * (max_len - len(deg_seq1)))
            deg_seq2.extend([0] * (max_len - len(deg_seq2)))
            
            # Calculate Pearson correlation
            if len(deg_seq1) > 1:
                correlation, _ = pearsonr(deg_seq1, deg_seq2)
                if not np.isnan(correlation):
                    iso_scores.append(correlation)
    
    return np.mean(iso_scores) if iso_scores else 0

def semantic_coherence_score(agents):
    """Measure if agents discover same semantic relationships"""
    print("  Calculating semantic relationship coherence...")
    
    # Get all relation types across agents
    all_relations = set()
    for agent in agents:
        for _, _, data in agent.graph.edges(data=True):
            if 'relation' in data:
                all_relations.add(data['relation'])
    
    coherence_scores = []
    
    for relation_type in all_relations:
        relation_graphs = []
        
        for agent in agents:
            # Extract edges for this relation type
            edges = set()
            for u, v, d in agent.graph.edges(data=True):
                if d.get('relation') == relation_type:
                    edges.add((u, v))
            relation_graphs.append(edges)
        
        # Calculate pairwise Jaccard similarity for this relation
        similarities = []
        for i in range(len(relation_graphs)):
            for j in range(i+1, len(relation_graphs)):
                if relation_graphs[i] or relation_graphs[j]:
                    intersection = len(relation_graphs[i] & relation_graphs[j])
                    union = len(relation_graphs[i] | relation_graphs[j])
                    jaccard = intersection / union if union > 0 else 0
                    similarities.append(jaccard)
        
        if similarities:
            coherence_scores.append(np.mean(similarities))
    
    return np.mean(coherence_scores) if coherence_scores else 0

def rejection_consistency(agents):
    """How consistently do agents reject the same false triples?"""
    print("  Analyzing false triple rejection consistency...")
    
    false_triple_decisions = {}
    
    for agent in agents:
        for entry in agent.validation_history:
            if entry['is_false']:
                triple_key = str(entry['triple'])
                if triple_key not in false_triple_decisions:
                    false_triple_decisions[triple_key] = []
                false_triple_decisions[triple_key].append(entry['decision'])
    
    # Calculate consistency for each false triple
    consistencies = []
    for decisions in false_triple_decisions.values():
        if len(decisions) > 1:
            # Fraction of agents that made the most common decision
            decision_counts = Counter(decisions)
            most_common_count = decision_counts.most_common(1)[0][1]
            consistency = most_common_count / len(decisions)
            consistencies.append(consistency)
    
    return np.mean(consistencies) if consistencies else 0

def path_structure_convergence(agents, sample_nodes=50):
    """Do agents develop similar path structures?"""
    print("  Measuring path structure convergence...")
    
    # Get common nodes across all agents
    common_nodes = set(agents[0].graph.nodes())
    for agent in agents[1:]:
        common_nodes &= set(agent.graph.nodes())
    
    if len(common_nodes) < 10:
        return 0
    
    sample_nodes_list = list(common_nodes)[:min(sample_nodes, len(common_nodes))]
    path_similarities = []
    
    for i in range(len(agents)):
        for j in range(i+1, len(agents)):
            node_path_sims = []
            
            # Sample pairs of nodes for path analysis
            sample_pairs = [(sample_nodes_list[k], sample_nodes_list[l]) 
                           for k in range(min(10, len(sample_nodes_list))) 
                           for l in range(k+1, min(20, len(sample_nodes_list)))]
            
            for source, target in sample_pairs[:100]:  # Limit for performance
                try:
                    path1 = nx.shortest_path_length(agents[i].graph, source, target)
                    path2 = nx.shortest_path_length(agents[j].graph, source, target)
                    
                    # Path length similarity
                    if path1 > 0 and path2 > 0:
                        sim = 1 - abs(path1 - path2) / max(path1, path2)
                        node_path_sims.append(sim)
                except nx.NetworkXNoPath:
                    # If one agent has path but other doesn't, similarity is 0
                    node_path_sims.append(0)
                except:
                    continue
            
            if node_path_sims:
                path_similarities.append(np.mean(node_path_sims))
    
    return np.mean(path_similarities) if path_similarities else 0

def concept_clustering_similarity(agents):
    """Do agents cluster concepts similarly?"""
    print("  Computing concept clustering similarity...")
    
    # Get common nodes
    common_nodes = set(agents[0].graph.nodes())
    for agent in agents[1:]:
        common_nodes &= set(agent.graph.nodes())
    
    if len(common_nodes) < 20:
        return 0
    
    common_nodes = list(common_nodes)[:100]  # Limit for performance
    
    # Create feature vectors for each agent
    agent_features = []
    for agent in agents:
        features = []
        for node in common_nodes:
            # Feature: neighbors count, degree, clustering coefficient
            neighbors = len(list(agent.graph.neighbors(node)))
            degree = agent.graph.degree(node)
            
            # Local clustering coefficient
            try:
                clustering = nx.clustering(agent.graph.to_undirected(), node)
            except:
                clustering = 0
                
            features.append([neighbors, degree, clustering])
        agent_features.append(features)
    
    # Compare clustering results
    clustering_scores = []
    n_clusters = min(5, len(common_nodes) // 4)  # Adaptive cluster count
    
    if n_clusters < 2:
        return 0
    
    for i in range(len(agents)):
        for j in range(i+1, len(agents)):
            try:
                # Ensure we have enough samples for clustering
                if len(agent_features[i]) < n_clusters or len(agent_features[j]) < n_clusters:
                    continue
                    
                kmeans1 = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
                kmeans2 = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
                
                labels1 = kmeans1.fit_predict(agent_features[i])
                labels2 = kmeans2.fit_predict(agent_features[j])
                
                ari = adjusted_rand_score(labels1, labels2)
                clustering_scores.append(ari)
            except Exception as e:
                continue
    
    return np.mean(clustering_scores) if clustering_scores else 0

print("Enhanced metric functions defined successfully!")

# Calculate Enhanced Metrics
print("\n" + "="*60)
print("CALCULATING ENHANCED VALIDATION METRICS")
print("="*60)

# Calculate all enhanced metrics
enhanced_metrics = {}

print("\n1. Structural Isomorphism Index...")
enhanced_metrics['structural_isomorphism'] = structural_isomorphism_index(agents)

print("\n2. Semantic Relationship Coherence...")
enhanced_metrics['semantic_coherence'] = semantic_coherence_score(agents)

print("\n3. False Triple Rejection Consistency...")
enhanced_metrics['rejection_consistency'] = rejection_consistency(agents)

print("\n4. Path Structure Convergence...")
enhanced_metrics['path_convergence'] = path_structure_convergence(agents)

print("\n5. Concept Clustering Similarity...")
enhanced_metrics['clustering_similarity'] = concept_clustering_similarity(agents)

# Calculate enhanced theory validation score
enhanced_metrics['weighted_validation'] = (
    0.4 * enhanced_metrics['structural_isomorphism'] +
    0.25 * enhanced_metrics['semantic_coherence'] +
    0.15 * enhanced_metrics['rejection_consistency'] +
    0.15 * enhanced_metrics['path_convergence'] +
    0.05 * enhanced_metrics['clustering_similarity']
)

print("\n" + "="*60)
print("ENHANCED METRICS RESULTS")
print("="*60)

for metric, value in enhanced_metrics.items():
    print(f"{metric.replace('_', ' ').title()}: {value:.4f}")

# Define evidence thresholds
evidence_thresholds = {
    'structural_isomorphism': 0.7,      # High structural similarity
    'semantic_coherence': 0.6,          # Consistent semantic relationships  
    'rejection_consistency': 0.8,       # Strong false-triple agreement
    'path_convergence': 0.5,            # Moderate path similarity
    'clustering_similarity': 0.4,       # Moderate clustering agreement
    'weighted_validation': 0.65         # Overall validation threshold
}

print(f"\n" + "="*60)
print("THEORY VALIDATION ASSESSMENT")
print("="*60)

evidence_count = 0
total_metrics = len(evidence_thresholds)

for metric, threshold in evidence_thresholds.items():
    value = enhanced_metrics[metric]
    meets_threshold = value >= threshold
    evidence_count += meets_threshold
    
    status = "✓ PASS" if meets_threshold else "✗ FAIL"
    print(f"{metric.replace('_', ' ').title()}: {value:.4f} (threshold: {threshold:.2f}) {status}")

# Overall assessment
evidence_strength = evidence_count / total_metrics

print(f"\nEvidence Strength: {evidence_count}/{total_metrics} metrics pass ({evidence_strength:.1%})")

if evidence_strength >= 0.8:
    final_assessment = "STRONG evidence for universal semantic structure theory"
elif evidence_strength >= 0.6:
    final_assessment = "MODERATE evidence for universal semantic structure theory"  
else:
    final_assessment = "WEAK evidence - requires further investigation"

print(f"\nFINAL ASSESSMENT: {final_assessment}")

# SECTION 13: VISUALIZATION

In [None]:
print("\n" + "="*60)
print("GENERATING VISUALIZATIONS")
print("="*60)

plt.style.use('seaborn-v0_8')

# List of plotting functions for each plot
plot_functions = []

# 1-6. Similarity Heatmaps (improved, one per figure, larger size)
for idx, (metric, matrix) in enumerate(similarity_results.items(), 1):
    def plot_heatmap(metric=metric, matrix=matrix):
        fig, ax = plt.subplots(figsize=(12, 10))  # Larger size
        mask = np.eye(len(matrix), dtype=bool)
        sns.heatmap(matrix, annot=True, fmt='.3f', cmap='coolwarm',
                    xticklabels=[f'A{i+1}' for i in range(num_agents)],
                    yticklabels=[f'A{i+1}' for i in range(num_agents)],
                    cbar=True, mask=mask, ax=ax, annot_kws={"size": 14})
        ax.set_title(f'{metric.replace("_", " ").title()} Similarity', fontsize=20)
        upper_triangle = matrix[np.triu_indices_from(matrix, k=1)]
        ax.text(0.5, 1.05, f"Mean: {np.mean(upper_triangle):.2f}", transform=ax.transAxes, ha='center', fontsize=16)
        ax.tick_params(axis='both', labelsize=14)
        plt.tight_layout()
        plt.show()
    plot_functions.append(plot_heatmap)

# 7. Training Progress (smoothed)
def plot_training_progress():
    fig, ax = plt.subplots(figsize=(14, 8))
    training_df = pd.DataFrame(training_history)
    for agent_id in training_df['agent_id'].unique():
        agent_data = training_df[training_df['agent_id'] == agent_id]
        if len(agent_data) > 5:
            smoothed = agent_data['accuracy'].rolling(window=5, min_periods=1).mean()
        else:
            smoothed = agent_data['accuracy']
        ax.plot(agent_data['iteration'], smoothed, label=agent_id, alpha=0.7, linewidth=3)
    ax.set_xlabel('Training Iteration', fontsize=16)
    ax.set_ylabel('Validation Accuracy', fontsize=16)
    ax.set_title('Training Progress by Agent (Smoothed)', fontsize=20)
    ax.legend(fontsize=14)
    ax.grid(True, alpha=0.3)
    ax.tick_params(axis='both', labelsize=14)
    plt.tight_layout()
    plt.show()
plot_functions.append(plot_training_progress)

# 8. Graph Size Distribution (trend line)
def plot_graph_size():
    fig, ax = plt.subplots(figsize=(12, 8))
    scatter = ax.scatter(perf_df['graph_nodes'], perf_df['graph_edges'], 
                        c=perf_df['final_accuracy'], cmap='viridis', s=200, alpha=0.8)
    plt.colorbar(scatter, ax=ax, label='Final Accuracy')
    ax.set_xlabel('Number of Nodes', fontsize=16)
    ax.set_ylabel('Number of Edges', fontsize=16)
    ax.set_title('Graph Size vs Accuracy', fontsize=20)
    z = np.polyfit(perf_df['graph_nodes'], perf_df['graph_edges'], 1)
    p = np.poly1d(z)
    ax.plot(perf_df['graph_nodes'], p(perf_df['graph_nodes']), "r--", alpha=0.5)
    for i, agent_id in enumerate(perf_df['agent_id']):
        ax.annotate(agent_id.split('_')[1], 
                    (perf_df.iloc[i]['graph_nodes'], perf_df.iloc[i]['graph_edges']),
                    xytext=(8, 8), textcoords='offset points', fontsize=14)
    ax.tick_params(axis='both', labelsize=14)
    plt.tight_layout()
    plt.show()
plot_functions.append(plot_graph_size)

# 9. Decision Distribution (percent labels)
def plot_decision_distribution():
    fig, ax = plt.subplots(figsize=(12, 8))
    decision_data = perf_df[['accept_rate', 'reject_rate', 'review_rate']]
    decision_data.plot(kind='bar', stacked=True, ax=ax, color=['#4CAF50', '#F44336', '#FFC107'])
    ax.set_xlabel('Agent', fontsize=16)
    ax.set_ylabel('Decision Rate', fontsize=16)
    ax.set_title('Decision Distribution by Agent', fontsize=20)
    ax.set_xticks(range(len(perf_df)))
    ax.set_xticklabels([f'A{i+1}' for i in range(len(perf_df))], rotation=0, fontsize=14)
    ax.legend(['Accept', 'Reject', 'Review'], fontsize=14)
    for idx, row in decision_data.iterrows():
        y = 0
        for val in row:
            ax.text(idx, y + val/2, f"{val*100:.1f}%", ha='center', va='center', fontsize=14, color='white')
            y += val
    ax.tick_params(axis='both', labelsize=14)
    plt.tight_layout()
    plt.show()
plot_functions.append(plot_decision_distribution)

# 10. Similarity Distribution (mean/median lines)
def plot_similarity_distribution():
    fig, ax = plt.subplots(figsize=(14, 8))
    all_similarities = []
    metric_labels = []
    for metric, matrix in similarity_results.items():
        upper_triangle = matrix[np.triu_indices_from(matrix, k=1)]
        all_similarities.extend(upper_triangle)
        metric_labels.extend([metric] * len(upper_triangle))
    sim_df = pd.DataFrame({'similarity': all_similarities, 'metric': metric_labels})
    sns.boxplot(data=sim_df, x='metric', y='similarity', ax=ax)
    ax.set_xlabel('Similarity Metric', fontsize=16)
    ax.set_ylabel('Similarity Score', fontsize=16)
    ax.set_title('Distribution of Similarity Scores', fontsize=20)
    ax.set_xticklabels(ax.get_xticklabels(), rotation=45, fontsize=14)
    for i, metric in enumerate(sim_df['metric'].unique()):
        vals = sim_df[sim_df['metric'] == metric]['similarity']
        ax.plot([i-0.2, i+0.2], [vals.mean()]*2, 'g-', lw=3)
        ax.plot([i-0.2, i+0.2], [vals.median()]*2, 'b--', lw=3)
    ax.tick_params(axis='both', labelsize=14)
    plt.tight_layout()
    plt.show()
plot_functions.append(plot_similarity_distribution)

# 11. Performance Correlation (fit line, corr coef)
def plot_performance_correlation():
    fig, ax = plt.subplots(figsize=(12, 8))
    x = perf_df['false_positive_rate']
    y = perf_df['final_accuracy']
    ax.scatter(x, y, s=200, alpha=0.8, c='red')
    if len(x) > 1:
        z = np.polyfit(x, y, 1)
        p = np.poly1d(z)
        ax.plot(x, p(x), "b--", alpha=0.7, linewidth=3)
        corr = np.corrcoef(x, y)[0, 1]
        ax.text(0.05, 0.95, f"r={corr:.2f}", transform=ax.transAxes, fontsize=16, ha='left', va='top')
    ax.set_xlabel('False Positive Rate', fontsize=16)
    ax.set_ylabel('Final Accuracy', fontsize=16)
    ax.set_title('Accuracy vs False Positive Rate', fontsize=20)
    for i, agent_id in enumerate(perf_df['agent_id']):
        ax.annotate(agent_id.split('_')[1], 
                    (perf_df.iloc[i]['false_positive_rate'], perf_df.iloc[i]['final_accuracy']),
                    xytext=(8, 8), textcoords='offset points', fontsize=14)
    ax.tick_params(axis='both', labelsize=14)
    plt.tight_layout()
    plt.show()
plot_functions.append(plot_performance_correlation)

# 12. Graph Density Analysis (mean/std lines)
def plot_graph_density():
    fig, ax = plt.subplots(figsize=(12, 8))
    ax.hist(perf_df['graph_density'], bins=10, alpha=0.7, edgecolor='black')
    ax.set_xlabel('Graph Density', fontsize=16)
    ax.set_ylabel('Frequency', fontsize=16)
    ax.set_title('Distribution of Graph Densities', fontsize=20)
    ax.axvline(perf_df['graph_density'].mean(), color='red', linestyle='--', label=f'Mean: {perf_df["graph_density"].mean():.4f}')
    ax.axvline(perf_df['graph_density'].mean() + perf_df['graph_density'].std(), color='blue', linestyle=':', label='±1 Std')
    ax.axvline(perf_df['graph_density'].mean() - perf_df['graph_density'].std(), color='blue', linestyle=':')
    ax.legend(fontsize=14)
    ax.tick_params(axis='both', labelsize=14)
    plt.tight_layout()
    plt.show()
plot_functions.append(plot_graph_density)

# 13. Clustering Coefficient (mean line)
def plot_clustering_coefficient():
    fig, ax = plt.subplots(figsize=(12, 8))
    ax.bar(range(len(perf_df)), perf_df['avg_clustering'], alpha=0.7, color='#2196F3')
    ax.set_xlabel('Agent', fontsize=16)
    ax.set_ylabel('Average Clustering Coefficient', fontsize=16)
    ax.set_title('Clustering by Agent', fontsize=20)
    ax.set_xticks(range(len(perf_df)))
    ax.set_xticklabels([f'A{i+1}' for i in range(len(perf_df))], fontsize=14)
    ax.axhline(perf_df['avg_clustering'].mean(), color='red', linestyle='--', label='Mean')
    ax.legend(fontsize=14)
    ax.tick_params(axis='both', labelsize=14)
    plt.tight_layout()
    plt.show()
plot_functions.append(plot_clustering_coefficient)

# 14. Resolution Performance Analysis
def plot_resolution_performance():
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
    
    # Left plot: Resolution rates by agent
    ax1.bar(range(len(perf_df)), perf_df['resolution_rate'], alpha=0.7, color='#4CAF50')
    ax1.set_xlabel('Agent', fontsize=14)
    ax1.set_ylabel('Resolution Rate', fontsize=14)
    ax1.set_title('Resolution Rate by Agent', fontsize=16)
    ax1.set_xticks(range(len(perf_df)))
    ax1.set_xticklabels([f'A{i+1}' for i in range(len(perf_df))], fontsize=12)
    ax1.axhline(perf_df['resolution_rate'].mean(), color='red', linestyle='--', 
                label=f'Mean: {perf_df["resolution_rate"].mean():.3f}')
    ax1.legend(fontsize=12)
    ax1.grid(True, alpha=0.3)
    
    # Add value labels on bars
    for i, v in enumerate(perf_df['resolution_rate']):
        ax1.text(i, v + 0.01, f'{v:.3f}', ha='center', va='bottom', fontsize=10)
    
    # Right plot: Forced decisions vs accuracy
    scatter = ax2.scatter(perf_df['forced_decisions'], perf_df['final_accuracy'], 
                         s=200, alpha=0.8, c=perf_df['resolution_rate'], cmap='viridis')
    plt.colorbar(scatter, ax=ax2, label='Resolution Rate')
    ax2.set_xlabel('Forced Decisions', fontsize=14)
    ax2.set_ylabel('Final Accuracy', fontsize=14)
    ax2.set_title('Forced Decisions vs Accuracy', fontsize=16)
    
    # Add correlation coefficient
    if len(perf_df['forced_decisions']) > 1:
        corr = np.corrcoef(perf_df['forced_decisions'], perf_df['final_accuracy'])[0, 1]
        ax2.text(0.05, 0.95, f'r = {corr:.3f}', transform=ax2.transAxes, 
                fontsize=12, bbox=dict(boxstyle='round', facecolor='white', alpha=0.8))
    
    # Annotate points with agent names
    for i, agent_id in enumerate(perf_df['agent_id']):
        ax2.annotate(agent_id.split('_')[1], 
                    (perf_df.iloc[i]['forced_decisions'], perf_df.iloc[i]['final_accuracy']),
                    xytext=(5, 5), textcoords='offset points', fontsize=10)
    
    ax2.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()
plot_functions.append(plot_resolution_performance)

# 15. Weighted vs Unweighted Jaccard (highlight outliers)
def plot_jaccard_comparison():
    fig, ax = plt.subplots(figsize=(12, 8))
    jaccard_upper = similarity_results['jaccard'][np.triu_indices_from(similarity_results['jaccard'], k=1)]
    weighted_jaccard_upper = similarity_results['weighted_jaccard'][np.triu_indices_from(similarity_results['weighted_jaccard'], k=1)]
    ax.scatter(jaccard_upper, weighted_jaccard_upper, alpha=0.8, s=200)
    ax.plot([0, 1], [0, 1], 'r--', alpha=0.8, label='Perfect correlation', linewidth=3)
    if len(jaccard_upper) > 0:
        diffs = np.abs(jaccard_upper - weighted_jaccard_upper)
        outlier_idx = np.argsort(diffs)[-2:]
        ax.scatter(np.array(jaccard_upper)[outlier_idx], np.array(weighted_jaccard_upper)[outlier_idx], color='orange', s=300, edgecolor='black', label='Outliers')
    ax.set_xlabel('Standard Jaccard Similarity', fontsize=16)
    ax.set_ylabel('Weighted Jaccard Similarity', fontsize=16)
    ax.set_title('Weighted vs Standard Jaccard', fontsize=20)
    ax.legend(fontsize=14)
    ax.grid(True, alpha=0.3)
    ax.tick_params(axis='both', labelsize=14)
    plt.tight_layout()
    plt.show()
plot_functions.append(plot_jaccard_comparison)

# Now, call each plotting function in sequence to display each plot in full size
for plot_func in plot_functions:
    plot_func()

print("Visualizations generated successfully!")

# SECTION 14: ENHANCED VISUALIZATIONS

In [None]:

print("\n" + "="*60)
print("ENHANCED THEORY VALIDATION VISUALIZATIONS")
print("="*60)

# 1. Enhanced Metrics Radar Chart
fig, ax1 = plt.subplots(figsize=(8, 7))
metrics_names = list(enhanced_metrics.keys())
metrics_values = list(enhanced_metrics.values())
angles = np.linspace(0, 2 * np.pi, len(metrics_names), endpoint=False)
values = metrics_values + [metrics_values[0]]
angles = np.concatenate((angles, [angles[0]]))
ax1.plot(angles, values, 'o-', linewidth=2, label='Achieved', color='navy')
ax1.fill(angles, values, alpha=0.25, color='navy')
thresholds = [evidence_thresholds.get(name, 0.5) for name in metrics_names] + [evidence_thresholds.get(metrics_names[0], 0.5)]
ax1.plot(angles, thresholds, '--', linewidth=2, color='red', label='Threshold')
ax1.fill_between(angles, values, thresholds, where=(np.array(values) < np.array(thresholds)), color='orange', alpha=0.2)
ax1.set_xticks(angles[:-1])
ax1.set_xticklabels([name.replace('_', '\n').title() for name in metrics_names], fontsize=9)

ax1.set_ylim(0, 1)
ax1.set_title('Theory Validation Radar')
ax1.legend()
ax1.grid(True)
max_idx = np.argmax(metrics_values)
min_idx = np.argmin(metrics_values)
ax1.annotate(f"Max: {metrics_names[max_idx].replace('_',' ').title()}\n{metrics_values[max_idx]:.2f}",
             (angles[max_idx], metrics_values[max_idx]), textcoords="offset points", xytext=(10,10), ha='left', color='green')
ax1.annotate(f"Min: {metrics_names[min_idx].replace('_',' ').title()}\n{metrics_values[min_idx]:.2f}",
             (angles[min_idx], metrics_values[min_idx]), textcoords="offset points", xytext=(-60,-20), ha='left', color='red')
plt.show()

# 2. Structural Isomorphism Heatmap
fig, ax2 = plt.subplots(figsize=(7, 6))
iso_matrix = np.zeros((len(agents), len(agents)))
for i in range(len(agents)):
    for j in range(len(agents)):
        if i == j:
            iso_matrix[i][j] = 1.0
        else:
            deg_seq1 = sorted([d for n, d in agents[i].graph.degree()], reverse=True)
            deg_seq2 = sorted([d for n, d in agents[j].graph.degree()], reverse=True)
            if deg_seq1 and deg_seq2:
                max_len = max(len(deg_seq1), len(deg_seq2))
                deg_seq1.extend([0] * (max_len - len(deg_seq1)))
                deg_seq2.extend([0] * (max_len - len(deg_seq2)))
                if len(deg_seq1) > 1:
                    corr, _ = pearsonr(deg_seq1, deg_seq2)
                    iso_matrix[i][j] = corr if not np.isnan(corr) else 0
sns.heatmap(iso_matrix, annot=True, fmt='.3f', cmap='vlag', ax=ax2,
            xticklabels=[f'A{i+1}' for i in range(len(agents))],
            yticklabels=[f'A{i+1}' for i in range(len(agents))], cbar=True)
ax2.set_title(f'Structural Isomorphism Matrix\nMean: {np.mean(iso_matrix[np.triu_indices_from(iso_matrix, k=1)]):.2f}')
plt.show()

# 3. Evidence Strength Bar Chart
fig, ax3 = plt.subplots(figsize=(8, 6))
metric_names = [name.replace('_', ' ').title() for name in enhanced_metrics.keys()]
metric_values = list(enhanced_metrics.values())
threshold_values = [evidence_thresholds.get(name, 0.5) for name in enhanced_metrics.keys()]
x_pos = np.arange(len(metric_names))
bars = ax3.bar(x_pos, metric_values, alpha=0.7, color='steelblue', label='Achieved')
ax3.bar(x_pos, threshold_values, alpha=0.3, color='red', label='Threshold')
ax3.set_xlabel('Metrics')
ax3.set_ylabel('Score')
ax3.set_title('Evidence Strength by Metric')
ax3.set_xticks(x_pos)
ax3.set_xticklabels(metric_names, rotation=45, ha='right')
ax3.legend()
ax3.grid(True, alpha=0.3)
# Color bars and annotate
for i, (bar, value, threshold) in enumerate(zip(bars, metric_values, threshold_values)):
    if value >= threshold:
        bar.set_color('green')
    else:
        bar.set_color('orange')
    ax3.text(bar.get_x() + bar.get_width()/2, value + 0.02, f"{value:.2f}", ha='center', va='bottom', fontsize=9)
    ax3.axhline(threshold, color='red', linestyle='--', linewidth=1, alpha=0.5)
plt.show()

# 4. Semantic Coherence by Relation Type
fig, ax4 = plt.subplots(figsize=(8, 6))
relation_coherence = {}
all_relations = set()
for agent in agents:
    for _, _, data in agent.graph.edges(data=True):
        if 'relation' in data:
            all_relations.add(data['relation'])
for relation_type in list(all_relations):
    relation_graphs = []
    for agent in agents:
        edges = set()
        for u, v, d in agent.graph.edges(data=True):
            if d.get('relation') == relation_type:
                edges.add((u, v))
        relation_graphs.append(edges)
    similarities = []
    for i in range(len(relation_graphs)):
        for j in range(i+1, len(relation_graphs)):
            if relation_graphs[i] or relation_graphs[j]:
                intersection = len(relation_graphs[i] & relation_graphs[j])
                union = len(relation_graphs[i] | relation_graphs[j])
                jaccard = intersection / union if union > 0 else 0
                similarities.append(jaccard)
    if similarities:
        relation_coherence[relation_type] = np.mean(similarities)
if relation_coherence:
    sorted_items = sorted(relation_coherence.items(), key=lambda x: x[1], reverse=True)
    relations = [k for k, v in sorted_items][:10]
    coherence_vals = [v for k, v in sorted_items][:10]
    bars = ax4.barh(relations, coherence_vals, alpha=0.7)
    ax4.set_xlabel('Coherence Score')
    ax4.set_title('Semantic Coherence by Relation Type (Top 10)')
    ax4.grid(True, alpha=0.3)
    # Highlight top 3 and bottom 3
    for i, bar in enumerate(bars):
        if i < 3:
            bar.set_color('green')
        elif i >= len(bars)-3:
            bar.set_color('red')
plt.show()

# 5. Theory Validation Timeline
fig, ax5 = plt.subplots(figsize=(8, 6))
validation_scores = []
for i in range(1, len(training_history)//len(agents) + 1):
    iteration_data = [r for r in training_history if r.get('iteration') == i]
    if iteration_data:
        avg_accuracy = np.mean([r['accuracy'] for r in iteration_data])
        avg_nodes = np.mean([r['graph_nodes'] for r in iteration_data])
        val_score = (avg_accuracy + min(avg_nodes/1000, 1)) / 2
        validation_scores.append(val_score)
if validation_scores:
    iterations = range(1, len(validation_scores) + 1)
    ax5.plot(iterations, validation_scores, linewidth=2, marker='o')
    ax5.axhline(y=0.65, color='red', linestyle='--', label='Validation Threshold')
    ax5.fill_between(iterations, 0.65, 1, color='red', alpha=0.08)
    ax5.set_xlabel('Training Iteration')
    ax5.set_ylabel('Validation Score')
    ax5.set_title('Theory Validation Over Time')
    # Annotate first crossing
    above = np.where(np.array(validation_scores) >= 0.65)[0]
    if len(above) > 0:
        ax5.annotate(f"Threshold crossed at iter {above[0]+1}", (above[0]+1, validation_scores[above[0]]),
                     xytext=(above[0]+10, validation_scores[above[0]]+0.05),
                     arrowprops=dict(facecolor='black', shrink=0.05), fontsize=9)
    ax5.legend()
    ax5.grid(True, alpha=0.3)
plt.show()

# 6. Final Assessment Summary
fig, ax6 = plt.subplots(figsize=(8, 6))
ax6.axis('off')
assessment_text = f"""
THEORY VALIDATION SUMMARY

Overall Consistency Score: {consistency_score:.3f}
Enhanced Validation Score: {enhanced_metrics['weighted_validation']:.3f}

Key Findings:
• Structural Isomorphism: {enhanced_metrics['structural_isomorphism']:.3f}
• Semantic Coherence: {enhanced_metrics['semantic_coherence']:.3f}  
• Rejection Consistency: {enhanced_metrics['rejection_consistency']:.3f}

Evidence Strength: {evidence_strength:.1%}
Assessment: {final_assessment}

Conclusion:
{"✓ Theory VALIDATED" if evidence_strength >= 0.6 else "⚠ Theory INCONCLUSIVE" if evidence_strength >= 0.4 else "✗ Theory NOT SUPPORTED"}
"""
# Add pass/fail icons for each key metric
icon = lambda v, t: '🟢' if v >= t else '🔴'
summary_lines = [
    f"{icon(enhanced_metrics['structural_isomorphism'], evidence_thresholds['structural_isomorphism'])} Structural Isomorphism",
    f"{icon(enhanced_metrics['semantic_coherence'], evidence_thresholds['semantic_coherence'])} Semantic Coherence",
    f"{icon(enhanced_metrics['rejection_consistency'], evidence_thresholds['rejection_consistency'])} Rejection Consistency"
]
ax6.text(0.1, 0.9, assessment_text + '\n' + '\n'.join(summary_lines), transform=ax6.transAxes, fontsize=11,
         verticalalignment='top', fontfamily='monospace',
         bbox=dict(boxstyle='round', facecolor='lightgray', alpha=0.8))
plt.show()

print("Enhanced visualizations completed!")

# SECTION 15: COMPREHENSIVE REPORT GENERATION

In [None]:

print("\n" + "="*60)
print("EXPERIMENT REPORT GENERATION")
print("="*60)

# Helper function to convert numpy types to native Python types for JSON serialization
def convert_for_json(obj):
    """Convert numpy types to native Python types for JSON serialization"""
    if hasattr(obj, 'item'):  # numpy scalar
        return obj.item()
    elif isinstance(obj, np.ndarray):
        return obj.tolist()
    elif isinstance(obj, (np.bool_, bool)):
        return bool(obj)
    elif isinstance(obj, (np.integer, int)):
        return int(obj)
    elif isinstance(obj, (np.floating, float)):
        return float(obj)
    elif isinstance(obj, dict):
        return {key: convert_for_json(value) for key, value in obj.items()}
    elif isinstance(obj, (list, tuple)):
        return [convert_for_json(item) for item in obj]
    else:
        return obj

# Create comprehensive experiment report
experiment_report = {
    'experiment_metadata': {
        'experiment_name': CONFIG['experiment_name'],
        'timestamp': datetime.now().isoformat(),
        'num_agents': CONFIG['num_agents'],
        'total_iterations': CONFIG['max_iterations'],
        'sample_size': CONFIG['sample_size'],
        'false_triple_ratio': CONFIG['false_triple_ratio'],
        'batch_size': CONFIG['batch_size']
    },
    
    'hypothesis_validation': {
        'overall_consistency_score': convert_for_json(consistency_score),
        'interpretation': interpretation,
        'evidence_strength': 'Strong' if consistency_score > 0.7 else 'Moderate' if consistency_score > 0.5 else 'Weak'
    },
    
    'similarity_analysis': {
        'metrics': {metric: {
            'mean': convert_for_json(summary_stats[metric]['mean']),
            'std': convert_for_json(summary_stats[metric]['std']),
            'min': convert_for_json(summary_stats[metric]['min']),
            'max': convert_for_json(summary_stats[metric]['max'])
        } for metric in summary_stats.keys()},
        'overall_similarity': convert_for_json(consistency_score)
    },
    
    'agent_performance': {
        'individual_agents': convert_for_json(agent_performance),
        'summary_statistics': {
            'mean_accuracy': convert_for_json(perf_df['final_accuracy'].mean()),
            'accuracy_std': convert_for_json(perf_df['final_accuracy'].std()),
            'mean_graph_size': {
                'nodes': convert_for_json(perf_df['graph_nodes'].mean()),
                'edges': convert_for_json(perf_df['graph_edges'].mean())
            },
            'mean_false_positive_rate': convert_for_json(perf_df['false_positive_rate'].mean())
        }
    },
    
    'robustness_analysis': {
        'false_triple_sensitivity': {
            'correlation_coefficient': convert_for_json(corr_coef),
            'p_value': convert_for_json(p_value),
            'robustness_assessment': 'Robust' if abs(corr_coef) < 0.3 else 'Sensitive'
        },
        'convergence_metrics': {
            'accuracy_variance': convert_for_json(accuracy_variance),
            'coefficient_of_variation': convert_for_json(accuracy_cv),
            'convergence_level': 'High' if accuracy_cv < 0.1 else 'Moderate' if accuracy_cv < 0.2 else 'Low'
        }
    },
    
    'structural_consistency': {
        'property_consistency': convert_for_json(structural_consistency),
        'overall_consistency': convert_for_json(overall_structural_consistency)
    },
    
    'enhanced_validation': {
        'structural_isomorphism_index': convert_for_json(enhanced_metrics['structural_isomorphism']),
        'semantic_coherence_score': convert_for_json(enhanced_metrics['semantic_coherence']),
        'rejection_consistency': convert_for_json(enhanced_metrics['rejection_consistency']),
        'path_structure_convergence': convert_for_json(enhanced_metrics['path_convergence']),
        'concept_clustering_similarity': convert_for_json(enhanced_metrics['clustering_similarity']),
        'weighted_validation_score': convert_for_json(enhanced_metrics['weighted_validation'])
    },
    
    'theory_assessment': {
        'evidence_strength': convert_for_json(evidence_strength),
        'metrics_passing_threshold': convert_for_json(evidence_count),
        'total_metrics_evaluated': convert_for_json(total_metrics),
        'final_assessment': final_assessment,
        'theory_validated': convert_for_json(evidence_strength >= 0.6),
        'isomorphism_evidence': convert_for_json(enhanced_metrics['structural_isomorphism'] >= 0.7),
        'semantic_consistency_evidence': convert_for_json(enhanced_metrics['semantic_coherence'] >= 0.6)
    },
    
    'conclusions': {
        'universal_structure_evidence': convert_for_json(consistency_score > 0.6),
        'agent_convergence': convert_for_json(accuracy_cv < 0.15),
        'robustness_to_noise': convert_for_json(abs(corr_coef) < 0.3),
        'theory_validation_score': convert_for_json((consistency_score + (1 - accuracy_cv) + (1 - abs(corr_coef))) / 3),
        'relational_meaning_validated': convert_for_json(enhanced_metrics['semantic_coherence'] > 0.5),
        'structural_isomorphism_confirmed': convert_for_json(enhanced_metrics['structural_isomorphism'] > 0.6),
        'self_correcting_behavior': convert_for_json(enhanced_metrics['rejection_consistency'] > 0.7),
        'emergent_logical_patterns': convert_for_json(enhanced_metrics['path_convergence'] > 0.4),
        'universal_semantic_principles': convert_for_json(enhanced_metrics['weighted_validation'] > 0.6),
        'theoretical_validation_strength': convert_for_json(evidence_strength)
    }
}

# Convert the entire report to ensure all nested values are JSON serializable
experiment_report = convert_for_json(experiment_report)

# Save experiment results
output_file = OUTPUT_PATH / f"multi_agent_experiment_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json"
with open(output_file, 'w') as f:
    json.dump(experiment_report, f, indent=2)

print(f"Experiment report saved to: {output_file}")

# Save detailed similarity results
similarity_file = OUTPUT_PATH / f"similarity_matrices_{datetime.now().strftime('%Y%m%d_%H%M%S')}.pkl"
with open(similarity_file, 'wb') as f:
    pickle.dump(similarity_results, f)

print(f"Similarity matrices saved to: {similarity_file}")

# Save agent graphs
try:
    import lxml
except ImportError:
    import sys
    import subprocess
    print("lxml not found. Installing lxml...")
    subprocess.check_call([sys.executable, '-m', 'pip', 'install', 'lxml'])

# Helper function to convert graph attributes
import collections.abc
def convert_graph_attributes_to_native(G):
    for n, d in G.nodes(data=True):
        for k, v in d.items():
            if hasattr(v, 'item'):
                d[k] = v.item()
            elif isinstance(v, (np.generic, np.ndarray)):
                d[k] = convert_for_json(v)
            elif isinstance(v, collections.abc.Mapping):
                d[k] = convert_for_json(v)
    for u, v, d in G.edges(data=True):
        for k, val in d.items():
            if hasattr(val, 'item'):
                d[k] = val.item()
            elif isinstance(val, (np.generic, np.ndarray)):
                d[k] = convert_for_json(val)
            elif isinstance(val, collections.abc.Mapping):
                d[k] = convert_for_json(val)
    return G

for i, agent in enumerate(agents):
    graph_file = OUTPUT_PATH / f"agent_{i+1}_graph_{datetime.now().strftime('%Y%m%d_%H%M%S')}.graphml"
    G_native = convert_graph_attributes_to_native(agent.graph.copy())
    nx.write_graphml(G_native, graph_file)
    print(f"Agent {i+1} graph saved to: {graph_file}")

print("\nAll experiment data saved successfully!")


In [None]:

print("\n" + "="*80)
print("EXPERIMENT SUMMARY AND CONCLUSIONS")
print("="*80)

print(f"""
### Key Findings

1. **Universal Structure Evidence**: Based on the overall consistency score and cross-agent similarity metrics
2. **Agent Convergence**: Analysis of whether agents independently arrived at similar semantic structures
3. **Robustness to Noise**: How well agents maintained consistency despite false triple injection
4. **Theory Validation**: Evidence for or against the hypothesis of universal meaning structures

### Results Summary

- **Training Mode**: {final_training_mode}
- **Overall Consistency Score**: {consistency_score:.4f}
- **Enhanced Validation Score**: {enhanced_metrics['weighted_validation']:.4f}
- **Evidence Strength**: {evidence_strength:.1%} ({evidence_count}/{total_metrics} metrics pass)
- **Final Assessment**: {final_assessment}

### Theoretical Implications

If agents consistently build similar semantic structures despite:
- Independent training
- No shared optimization
- Presence of false information
- Different random initializations

This provides strong evidence for an underlying universal theory of meaning that is:
- Self-reinforcing
- Contradiction-rejecting  
- Logically consistent
- Emergent from basic semantic relationships

### Scalability Notes

This experiment framework is designed to scale from small test datasets to the full 3M ConceptNet dataset:

- **Sample Size**: Easily adjustable via CONFIG['sample_size']
- **Agent Count**: Configurable number of agents for statistical power
- **Iteration Control**: Adjustable training iterations for thoroughness
- **Memory Management**: Efficient graph storage and processing

### Next Steps

1. **Scale Up**: Run with larger datasets (100K, 500K, full 3M triples)
2. **Parameter Tuning**: Optimize false triple ratios and validation thresholds
3. **Extended Metrics**: Add more sophisticated similarity measures
4. **Cross-Validation**: Implement k-fold validation across different data splits
5. **Temporal Analysis**: Study how similarity evolves during training

**Experiment completed successfully!** All results, graphs, and analyses have been saved to the output directory.
""")

print("="*80)
print("MULTI-AGENT SEMANTIC VALIDATION EXPERIMENT COMPLETE")
print("="*80)