# Lean 8 - Agents Autonomes pour Demonstration de Theoremes

**Navigation** : [← Lean-7-LLM-Integration](Lean-7-LLM-Integration.ipynb) | [Index](Lean-1-Setup.ipynb) | [Lean-9-LeanDojo →](Lean-9-LeanDojo.ipynb)

---


## Introduction

Ce notebook final de la serie explore la creation de **systemes multi-agents** capables de prouver des theoremes mathematiques de maniere **autonome**. Nous combinons les techniques des notebooks precedents avec les patterns d'orchestration agentique.

L'objectif est de construire un systeme qui peut :
1. Recevoir un enonce de theoreme
2. Rechercher des lemmes pertinents dans Mathlib
3. Generer des strategies de preuve
4. Verifier formellement avec Lean
5. Iterer jusqu'au succes

### Objectifs pedagogiques

1. Concevoir une architecture multi-agents pour theorem proving
2. Implementer des agents specialises (recherche, generation, verification)
3. Orchestrer la collaboration entre agents
4. Gerer les boucles de feedback et d'amelioration
5. Comprendre les techniques de Harmonic Aristotle et APOLLO

### Prerequis

- Notebooks **Lean-1** a **Lean-7** completes
- Notions de base sur les systemes multi-agents
- Cle API LLM (optionnel pour execution)

### Duree estimee : 55-60 minutes

---

## Architecture d'un Systeme Agentique pour Lean

### Vue d'ensemble

```
┌─────────────────────────────────────────────────────────────────────┐
│                     SYSTEME AGENTIQUE LEAN                          │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  ┌─────────────────┐                                               │
│  │   ORCHESTRATOR  │  <- Coordonne tous les agents                 │
│  │     Agent       │                                               │
│  └────────┬────────┘                                               │
│           │                                                        │
│  ┌────────┼────────┬────────────────┐                              │
│  │        │        │                │                              │
│  v        v        v                v                              │
│ ┌────┐  ┌────┐  ┌────┐         ┌────────┐                          │
│ │Search│ │Tactic│ │Proof│        │Memory  │                         │
│ │Agent│ │Agent│ │Verify│        │Store   │                         │
│ └──┬───┘ └──┬───┘ └──┬───┘        └────────┘                         │
│    │        │        │                                             │
│    v        v        v                                             │
│ ┌──────────────────────────────────────────────┐                   │
│ │               LEAN KERNEL                     │                   │
│ │  (Verification formelle + Mathlib)           │                   │
│ └──────────────────────────────────────────────┘                   │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘
```

## 1. Agent de Recherche de Theoremes

### 1.1 Role

L'agent de recherche parcourt Mathlib pour trouver des lemmes pertinents au probleme.

In [1]:
from dataclasses import dataclass
from typing import List, Optional
import json

@dataclass
class Lemma:
    """Represente un lemme Mathlib."""
    name: str
    statement: str
    namespace: str
    relevance_score: float = 0.0

class TheoremSearchAgent:
    """Agent de recherche de theoremes dans Mathlib."""
    
    def __init__(self, llm_client=None):
        self.llm = llm_client
        self.cache = {}  # Cache des recherches
    
    def search(self, goal: str, context: str = "") -> List[Lemma]:
        """
        Recherche des lemmes pertinents pour un but donne.
        
        Args:
            goal: Le but a prouver
            context: Contexte additionnel (hypotheses, etc.)
        
        Returns:
            Liste de lemmes tries par pertinence
        """
        # Verifier le cache
        cache_key = f"{goal}:{context}"
        if cache_key in self.cache:
            return self.cache[cache_key]
        
        # Analyser le but pour extraire les concepts
        concepts = self._extract_concepts(goal)
        
        # Rechercher dans Mathlib
        lemmas = self._search_mathlib(concepts)
        
        # Scorer par pertinence
        scored = self._score_lemmas(lemmas, goal)
        
        # Mettre en cache
        self.cache[cache_key] = scored
        
        return scored
    
    def _extract_concepts(self, goal: str) -> List[str]:
        """Extrait les concepts mathematiques du but."""
        # Simplification : extraction par mots-cles
        keywords = ["add", "mul", "comm", "assoc", "zero", "one", "succ"]
        return [k for k in keywords if k in goal.lower()]
    
    def _search_mathlib(self, concepts: List[str]) -> List[Lemma]:
        """Simule la recherche dans Mathlib."""
        # Base de lemmes simulee
        mathlib_lemmas = [
            Lemma("Nat.add_zero", "n + 0 = n", "Nat"),
            Lemma("Nat.zero_add", "0 + n = n", "Nat"),
            Lemma("Nat.add_comm", "n + m = m + n", "Nat"),
            Lemma("Nat.add_assoc", "(n + m) + k = n + (m + k)", "Nat"),
            Lemma("Nat.mul_comm", "n * m = m * n", "Nat"),
            Lemma("Nat.mul_assoc", "(n * m) * k = n * (m * k)", "Nat"),
        ]
        
        # Filtrer par concepts
        return [l for l in mathlib_lemmas 
                if any(c in l.name.lower() for c in concepts)]
    
    def _score_lemmas(self, lemmas: List[Lemma], goal: str) -> List[Lemma]:
        """Score les lemmes par pertinence."""
        for lemma in lemmas:
            # Score simple : correspondance de termes
            lemma.relevance_score = sum(
                1 for word in lemma.statement.split() 
                if word in goal
            ) / max(len(goal.split()), 1)
        
        return sorted(lemmas, key=lambda l: l.relevance_score, reverse=True)

# Test
search_agent = TheoremSearchAgent()
results = search_agent.search("n + 0 = n")
print("Lemmes trouves:")
for lemma in results:
    print(f"  {lemma.name}: {lemma.statement} (score: {lemma.relevance_score:.2f})")

Lemmes trouves:


## 2. Agent de Generation de Tactiques

### 2.1 Role

L'agent de tactiques genere des sequences de tactiques Lean pour prouver le but.

In [2]:
from enum import Enum
from typing import Tuple

class TacticType(Enum):
    DIRECT = "direct"       # exact, rfl
    REWRITE = "rewrite"     # rw, simp
    SPLIT = "split"         # constructor, cases
    INDUCTION = "induction" # induction, recursion
    AUTO = "auto"           # omega, ring, linarith

@dataclass
class TacticSuggestion:
    """Une suggestion de tactique avec son contexte."""
    tactic: str
    tactic_type: TacticType
    confidence: float
    explanation: str

class TacticGeneratorAgent:
    """Agent de generation de tactiques."""
    
    def __init__(self, llm_client=None):
        self.llm = llm_client
        self.history = []  # Historique des tentatives
    
    def generate(self, goal: str, context: List[str], 
                 available_lemmas: List[Lemma]) -> List[TacticSuggestion]:
        """
        Genere des tactiques pour un but donne.
        
        Args:
            goal: Le but courant
            context: Les hypotheses disponibles
            available_lemmas: Lemmes suggeres par l'agent de recherche
        
        Returns:
            Liste de suggestions de tactiques
        """
        suggestions = []
        
        # Strategie 1: Tactiques directes
        if "=" in goal:
            suggestions.append(TacticSuggestion(
                "rfl", TacticType.DIRECT, 0.9,
                "Reflexivite - verifie si les deux cotes sont identiques"
            ))
        
        # Strategie 2: Utiliser les lemmes disponibles
        for lemma in available_lemmas[:3]:
            suggestions.append(TacticSuggestion(
                f"exact {lemma.name}", TacticType.DIRECT, 
                lemma.relevance_score,
                f"Appliquer {lemma.name}: {lemma.statement}"
            ))
            suggestions.append(TacticSuggestion(
                f"rw [{lemma.name}]", TacticType.REWRITE,
                lemma.relevance_score * 0.8,
                f"Reecrire avec {lemma.name}"
            ))
        
        # Strategie 3: Tactiques automatiques
        if any(op in goal for op in ["+", "-", "<", ">", "<=", ">="]):
            suggestions.append(TacticSuggestion(
                "omega", TacticType.AUTO, 0.7,
                "Arithmetique de Presburger automatique"
            ))
        
        if "*" in goal or "^" in goal:
            suggestions.append(TacticSuggestion(
                "ring", TacticType.AUTO, 0.7,
                "Algebre polynomiale automatique"
            ))
        
        # Strategie 4: Simp comme fallback
        suggestions.append(TacticSuggestion(
            "simp", TacticType.REWRITE, 0.5,
            "Simplification automatique"
        ))
        
        # Trier par confiance
        return sorted(suggestions, key=lambda s: s.confidence, reverse=True)
    
    def generate_sequence(self, goal: str, context: List[str],
                          available_lemmas: List[Lemma],
                          max_depth: int = 5) -> List[str]:
        """
        Genere une sequence complete de tactiques.
        """
        sequence = []
        current_goal = goal
        
        for _ in range(max_depth):
            suggestions = self.generate(current_goal, context, available_lemmas)
            if not suggestions:
                break
            
            best = suggestions[0]
            sequence.append(best.tactic)
            
            # Simuler la progression (dans la realite, Lean nous dirait le nouveau but)
            if best.tactic_type == TacticType.DIRECT:
                break  # Preuve complete
        
        return sequence

# Test
tactic_agent = TacticGeneratorAgent()
lemmas = search_agent.search("n + 0 = n")
suggestions = tactic_agent.generate("n + 0 = n", [], lemmas)

print("Tactiques suggerees:")
for s in suggestions[:5]:
    print(f"  [{s.confidence:.2f}] {s.tactic} - {s.explanation}")

Tactiques suggerees:
  [0.90] rfl - Reflexivite - verifie si les deux cotes sont identiques
  [0.70] omega - Arithmetique de Presburger automatique
  [0.50] simp - Simplification automatique


## 3. Agent de Verification

### 3.1 Role

L'agent de verification execute le code Lean et analyse les resultats.

In [3]:
@dataclass
class VerificationResult:
    """Resultat de la verification Lean."""
    success: bool
    error_message: Optional[str] = None
    remaining_goals: List[str] = None
    execution_time: float = 0.0

class ProofVerifierAgent:
    """Agent de verification des preuves."""
    
    def __init__(self, lean_path: str = "lean"):
        self.lean_path = lean_path
        self.verified_count = 0
        self.failed_count = 0
    
    def verify(self, theorem: str, proof: str) -> VerificationResult:
        """
        Verifie une preuve avec Lean.
        
        Args:
            theorem: L'enonce du theoreme
            proof: La preuve proposee (sequence de tactiques)
        
        Returns:
            Resultat de la verification
        """
        # Construire le code Lean complet
        lean_code = self._build_lean_code(theorem, proof)
        
        # Simuler l'execution Lean
        # (Dans un vrai systeme, on utiliserait subprocess ou lean-dojo)
        result = self._simulate_lean_execution(lean_code)
        
        # Mettre a jour les statistiques
        if result.success:
            self.verified_count += 1
        else:
            self.failed_count += 1
        
        return result
    
    def _build_lean_code(self, theorem: str, proof: str) -> str:
        """Construit le code Lean complet."""
        return f"""
{theorem} := by
  {proof}
        """.strip()
    
    def _simulate_lean_execution(self, code: str) -> VerificationResult:
        """
        Simule l'execution Lean.
        Dans un vrai systeme, utiliser lean-dojo ou subprocess.
        """
        # Heuristiques simples pour la simulation
        if "rfl" in code or "exact Nat.add_zero" in code:
            return VerificationResult(success=True)
        elif "sorry" in code:
            return VerificationResult(
                success=False,
                error_message="declaration uses 'sorry'"
            )
        else:
            # Simuler une reussite aleatoire
            import random
            if random.random() > 0.3:
                return VerificationResult(success=True)
            else:
                return VerificationResult(
                    success=False,
                    error_message="tactic failed"
                )
    
    def get_stats(self) -> dict:
        """Retourne les statistiques de verification."""
        total = self.verified_count + self.failed_count
        return {
            "verified": self.verified_count,
            "failed": self.failed_count,
            "success_rate": self.verified_count / max(total, 1)
        }

# Test
verifier = ProofVerifierAgent()
result = verifier.verify(
    "theorem test (n : Nat) : n + 0 = n",
    "exact Nat.add_zero n"
)
print(f"Verification: {'Succes' if result.success else 'Echec'}")
if result.error_message:
    print(f"Erreur: {result.error_message}")

Verification: Succes


## 4. Agent Orchestrateur

### 4.1 Role

L'orchestrateur coordonne tous les agents pour resoudre un probleme.

In [4]:
@dataclass
class ProofAttempt:
    """Enregistre une tentative de preuve."""
    theorem: str
    tactics: List[str]
    result: VerificationResult
    iteration: int

class OrchestratorAgent:
    """
    Agent orchestrateur qui coordonne le systeme multi-agents.
    """
    
    def __init__(self):
        self.search_agent = TheoremSearchAgent()
        self.tactic_agent = TacticGeneratorAgent()
        self.verifier = ProofVerifierAgent()
        self.history: List[ProofAttempt] = []
        self.max_iterations = 10
    
    def prove(self, theorem: str) -> Tuple[bool, Optional[str]]:
        """
        Tente de prouver un theoreme.
        
        Args:
            theorem: L'enonce du theoreme
        
        Returns:
            (succes, preuve) ou (echec, None)
        """
        print(f"\n{'='*60}")
        print(f"Debut de la preuve: {theorem}")
        print(f"{'='*60}\n")
        
        for iteration in range(self.max_iterations):
            print(f"--- Iteration {iteration + 1} ---")
            
            # Etape 1: Rechercher des lemmes pertinents
            goal = self._extract_goal(theorem)
            lemmas = self.search_agent.search(goal)
            print(f"Lemmes trouves: {[l.name for l in lemmas[:3]]}")
            
            # Etape 2: Generer des tactiques
            tactics = self.tactic_agent.generate_sequence(
                goal, [], lemmas
            )
            proof = "\n  ".join(tactics)
            print(f"Tactiques generees: {tactics}")
            
            # Etape 3: Verifier
            result = self.verifier.verify(theorem, proof)
            
            # Enregistrer la tentative
            self.history.append(ProofAttempt(
                theorem, tactics, result, iteration
            ))
            
            if result.success:
                print(f"\nPreuve trouvee!")
                return True, proof
            else:
                print(f"Echec: {result.error_message}")
                # Apprendre de l'echec pour la prochaine iteration
                self._learn_from_failure(result)
        
        print(f"\nEchec apres {self.max_iterations} iterations")
        return False, None
    
    def _extract_goal(self, theorem: str) -> str:
        """Extrait le but du theoreme."""
        # Simplification: prendre la partie apres le ":"
        if ":" in theorem:
            return theorem.split(":", 1)[1].strip()
        return theorem
    
    def _learn_from_failure(self, result: VerificationResult):
        """Ajuste la strategie basee sur l'echec."""
        # Dans un vrai systeme, on ajusterait les poids,
        # eviterait les tactiques qui echouent, etc.
        pass
    
    def get_statistics(self) -> dict:
        """Retourne les statistiques du systeme."""
        return {
            "total_attempts": len(self.history),
            "verifier_stats": self.verifier.get_stats()
        }

# Demonstration
orchestrator = OrchestratorAgent()
success, proof = orchestrator.prove(
    "theorem add_zero (n : Nat) : n + 0 = n"
)

if success:
    print(f"\nPreuve finale:\n{proof}")


Debut de la preuve: theorem add_zero (n : Nat) : n + 0 = n

--- Iteration 1 ---
Lemmes trouves: []
Tactiques generees: ['rfl']

Preuve trouvee!

Preuve finale:
rfl


## 5. Integration avec Semantic Kernel (Python)

### 5.1 Vue d'ensemble

Microsoft **Semantic Kernel** est un SDK qui permet d'orchestrer des LLMs avec des plugins, de la memoire et des agents intelligents. Nous allons implementer un systeme multi-agents pour theorem proving inspire des patterns utilises dans l'analyse argumentative (voir `Argument_Analysis` notebooks).

**Composants cles** :
- **Kernel** : Point d'entree principal, configure les services LLM
- **Plugins** : Fonctions appelables par les agents (decorated avec `@kernel_function`)
- **Agents** : Entites autonomes avec instructions et capacites
- **Orchestration** : Strategies de selection et terminaison des agents

### 5.2 Dependances

```python
# Installation
pip install semantic-kernel openai python-dotenv
```

In [5]:
# =============================================================================
# Section 8.1 - ProofState: Etat Partage pour Multi-Agents
# =============================================================================
# Pattern inspire de RhetoricalAnalysisState dans Argument_Analysis
# Permet la synchronisation entre agents avec designation explicite

import os
import sys
import json
import time
import uuid
from dataclasses import dataclass, field
from typing import Dict, List, Optional, Any, Tuple
from datetime import datetime
from enum import Enum
from pathlib import Path

# --- Charger .env pour les cles API ---
try:
    from dotenv import load_dotenv
    env_paths = [Path(".env"), Path("../.env"), Path(__file__).parent / ".env" if "__file__" in dir() else Path(".env")]
    for p in env_paths:
        if p.exists():
            load_dotenv(p)
            print(f"Configuration chargee depuis: {p.absolute()}")
            break
except ImportError:
    pass

# --- Importer lean_runner.py ---
lean_dir = Path(".").absolute()
if str(lean_dir) not in sys.path:
    sys.path.insert(0, str(lean_dir))

from lean_runner import LeanRunner, LeanResult

# --- Enumerations ---

class ProofStrategy(Enum):
    """Strategie de preuve en cours."""
    EXPLORATION = "exploration"      # Recherche initiale de lemmes
    REFINEMENT = "refinement"        # Affinage des tactiques
    VALIDATION = "validation"        # Verification finale
    RECOVERY = "recovery"            # Recuperation apres echecs

class TacticDifficulty(Enum):
    """Niveau de difficulte des tactiques."""
    SIMPLE = "simple"      # rfl, exact, omega
    MEDIUM = "medium"      # induction, cases, simp
    COMPLEX = "complex"    # ring, linarith, aesop, custom

# --- DataClasses pour les sous-composants ---

@dataclass
class Lemma:
    """Lemme decouvert par SearchAgent."""
    name: str
    statement: str
    namespace: str
    relevance: float  # 0.0-1.0
    timestamp: str = field(default_factory=lambda: datetime.now().isoformat())

@dataclass
class TacticAttempt:
    """Tentative de tactique par TacticAgent."""
    id: str
    tactic: str
    state_before: str           # Goals avant cette tactique
    confidence: float           # 0.0-1.0
    explanation: str
    iteration: int
    timestamp: str = field(default_factory=lambda: datetime.now().isoformat())

@dataclass
class VerificationResult:
    """Resultat de verification par VerifierAgent."""
    attempt_id: str             # Reference a TacticAttempt
    success: bool
    output: str
    errors: str
    remaining_goals: Optional[str]
    exec_time_ms: float
    backend: str
    timestamp: str = field(default_factory=lambda: datetime.now().isoformat())

# --- ProofState Principal ---

@dataclass
class ProofState:
    """
    Etat partage d'une session de preuve multi-agents.

    Architecture:
    - SearchAgent decouvre les lemmes
    - TacticAgent genere des tactiques
    - VerifierAgent verifie avec Lean
    - CriticAgent analyse les echecs
    - CoordinatorAgent supervise

    Pattern cle: Designation explicite de l'agent suivant (_next_agent_designated)
    """

    # === Theoreme ===
    theorem_statement: str = ""
    theorem_goal: str = ""
    theorem_context: Dict[str, Any] = field(default_factory=dict)

    # === Decouvertes ===
    discovered_lemmas: Dict[str, Lemma] = field(default_factory=dict)
    tactics_history: List[TacticAttempt] = field(default_factory=list)
    current_proof_state: Optional[str] = None  # Goals Lean restants

    # === Verifications ===
    verification_results: Dict[str, VerificationResult] = field(default_factory=dict)

    # === Coordination ===
    current_strategy: ProofStrategy = ProofStrategy.EXPLORATION
    iteration_count: int = 0
    max_iterations: int = 50
    _next_agent_designated: Optional[str] = None

    # === Conclusion ===
    proof_complete: bool = False
    final_proof: Optional[str] = None

    # === Metriques ===
    start_time: str = field(default_factory=lambda: datetime.now().isoformat())
    total_llm_tokens: int = 0
    total_lean_time_ms: float = 0.0

    # --- Methodes de modification d'etat ---

    def add_lemma(self, name: str, statement: str, namespace: str = "", relevance: float = 0.5) -> str:
        """Ajoute un lemme decouvert. Retourne son ID."""
        lemma_id = f"lemma_{len(self.discovered_lemmas) + 1}_{uuid.uuid4().hex[:4]}"
        self.discovered_lemmas[lemma_id] = Lemma(
            name=name, statement=statement, namespace=namespace, relevance=relevance
        )
        return lemma_id

    def add_tactic_attempt(
        self, tactic: str, state_before: str, confidence: float = 0.5, explanation: str = ""
    ) -> str:
        """Enregistre une tentative de tactique. Retourne son ID."""
        attempt_id = f"tactic_{len(self.tactics_history) + 1}_{uuid.uuid4().hex[:4]}"
        self.tactics_history.append(TacticAttempt(
            id=attempt_id, tactic=tactic, state_before=state_before,
            confidence=confidence, explanation=explanation, iteration=self.iteration_count
        ))
        return attempt_id

    def add_verification(
        self, attempt_id: str, success: bool, output: str, errors: str,
        remaining_goals: Optional[str], exec_time_ms: float, backend: str
    ) -> str:
        """Enregistre un resultat de verification. Retourne son ID."""
        verif_id = f"verif_{len(self.verification_results) + 1}_{uuid.uuid4().hex[:4]}"
        self.verification_results[verif_id] = VerificationResult(
            attempt_id=attempt_id, success=success, output=output, errors=errors,
            remaining_goals=remaining_goals, exec_time_ms=exec_time_ms, backend=backend
        )
        self.total_lean_time_ms += exec_time_ms
        return verif_id

    def designate_next_agent(self, agent_name: str):
        """Designe l'agent qui doit agir au prochain tour."""
        self._next_agent_designated = agent_name

    def consume_next_agent_designation(self) -> Optional[str]:
        """Recupere et efface la designation."""
        designation = self._next_agent_designated
        self._next_agent_designated = None
        return designation

    def set_proof_complete(self, proof: str):
        """Marque la preuve comme terminee."""
        self.proof_complete = True
        self.final_proof = proof

    def increment_iteration(self):
        """Incremente le compteur d'iterations."""
        self.iteration_count += 1

    def set_strategy(self, strategy: ProofStrategy):
        """Change la strategie de preuve."""
        self.current_strategy = strategy

    def update_proof_state(self, goals: str):
        """Met a jour les goals Lean restants."""
        self.current_proof_state = goals

    # --- Methodes de lecture ---

    def get_recent_failures(self, n: int = 3) -> List[Tuple[TacticAttempt, VerificationResult]]:
        """Retourne les N derniers echecs pour analyse."""
        failures = []
        for attempt in reversed(self.tactics_history):
            for verif in self.verification_results.values():
                if verif.attempt_id == attempt.id and not verif.success:
                    failures.append((attempt, verif))
                    if len(failures) >= n:
                        return failures
        return failures

    def get_successful_tactics(self) -> List[str]:
        """Retourne les tactiques qui ont fonctionne."""
        successful_ids = {v.attempt_id for v in self.verification_results.values() if v.success}
        return [a.tactic for a in self.tactics_history if a.id in successful_ids]

    def get_state_snapshot(self, summarize: bool = True) -> Dict[str, Any]:
        """Retourne un snapshot de l'etat (JSON-serializable)."""
        if summarize:
            return {
                "theorem": self.theorem_statement[:100] + "..." if len(self.theorem_statement) > 100 else self.theorem_statement,
                "strategy": self.current_strategy.value,
                "iteration": self.iteration_count,
                "lemmas_found": len(self.discovered_lemmas),
                "tactics_tried": len(self.tactics_history),
                "verifications": len(self.verification_results),
                "proof_complete": self.proof_complete,
                "current_goals": self.current_proof_state[:200] if self.current_proof_state else None,
                "recent_failures": len(self.get_recent_failures(5))
            }
        else:
            # Snapshot complet
            return {
                "theorem_statement": self.theorem_statement,
                "theorem_goal": self.theorem_goal,
                "discovered_lemmas": {k: vars(v) for k, v in self.discovered_lemmas.items()},
                "tactics_history": [vars(t) for t in self.tactics_history],
                "verification_results": {k: vars(v) for k, v in self.verification_results.items()},
                "current_strategy": self.current_strategy.value,
                "iteration_count": self.iteration_count,
                "proof_complete": self.proof_complete,
                "final_proof": self.final_proof
            }

    def __str__(self) -> str:
        """Representation lisible pour debug."""
        return json.dumps(self.get_state_snapshot(summarize=True), indent=2)


# === Test de ProofState ===
print("=== Test ProofState ===")
state = ProofState(
    theorem_statement="theorem add_comm (n m : Nat) : n + m = m + n",
    theorem_goal="n + m = m + n",
    max_iterations=30
)

# Ajouter des elements
lemma_id = state.add_lemma("Nat.add_comm", "n + m = m + n", "Nat", relevance=0.95)
print(f"Lemme ajoute: {lemma_id}")

attempt_id = state.add_tactic_attempt("exact Nat.add_comm n m", "n + m = m + n", confidence=0.8, explanation="Application directe du lemme")
print(f"Tactique ajoutee: {attempt_id}")

verif_id = state.add_verification(attempt_id, success=True, output="Goals accomplished", errors="", remaining_goals=None, exec_time_ms=45.2, backend="subprocess")
print(f"Verification ajoutee: {verif_id}")

state.set_proof_complete("exact Nat.add_comm n m")
print(f"\nEtat final:\n{state}")


Theoreme: theorem test (n : Nat) : n + 0 = n
Taches: 1
Lemmes trouves: 1
Tactiques tentees: 0
Iterations: 0



### 8.2-8.5 Plugins Semantic Kernel

L'architecture utilise 4 plugins specialises, chacun exposant des fonctions via `@kernel_function`:

| Plugin | Role | Fonctions cles |
|--------|------|----------------|
| **ProofStateManagerPlugin** | Gestion de l'etat | get_proof_state, add_lemma, designate_next_agent |
| **LeanSearchPlugin** | Recherche Mathlib | search_mathlib_lemmas, check_lemma_type |
| **LeanTacticPlugin** | Generation tactiques | generate_tactics, analyze_tactic_failure |
| **LeanVerificationPlugin** | Verification Lean | verify_proof, verify_tactic_step |

Ce pattern permet aux agents d'appeler ces fonctions automatiquement grace au `FunctionChoiceBehavior.Auto()` de Semantic Kernel.

In [6]:
# =============================================================================
# Section 8.2-8.5 - Plugins Semantic Kernel
# =============================================================================
# Architecture en 4 plugins specialises:
# - ProofStateManagerPlugin: Gestion de l'etat partage
# - LeanSearchPlugin: Recherche de lemmes Mathlib
# - LeanTacticPlugin: Generation de tactiques
# - LeanVerificationPlugin: Verification avec lean_runner.py

# Import du decorateur kernel_function
try:
    from semantic_kernel.functions import kernel_function
    SK_AVAILABLE = True
    print("Semantic Kernel disponible - utilisation des vrais decorateurs")
except ImportError:
    SK_AVAILABLE = False
    print("Semantic Kernel non disponible - mode simulation")
    # Decorateur de simulation
    def kernel_function(description="", name=None):
        def decorator(func):
            func._sk_function = True
            func._sk_description = description
            func._sk_name = name or func.__name__
            return func
        return decorator

# =============================================================================
# 8.2 ProofStateManagerPlugin
# =============================================================================

class ProofStateManagerPlugin:
    """
    Plugin pour gerer l'etat partage de la preuve.
    Expose les methodes de ProofState via @kernel_function.
    """

    def __init__(self, state: ProofState):
        self._state = state

    @kernel_function(
        description="Obtient un apercu de l'etat actuel de la preuve (theoreme, lemmes, tactiques, etc.)",
        name="get_proof_state"
    )
    def get_proof_state(self, summarize: bool = True) -> str:
        """Retourne l'etat actuel sous forme JSON."""
        snapshot = self._state.get_state_snapshot(summarize=summarize)
        return json.dumps(snapshot, indent=2, ensure_ascii=False)

    @kernel_function(
        description="Ajoute un lemme decouvert a l'etat partage",
        name="add_discovered_lemma"
    )
    def add_discovered_lemma(
        self, name: str, statement: str, namespace: str = "", relevance: float = 0.5
    ) -> str:
        """Enregistre un lemme trouve par SearchAgent."""
        lemma_id = self._state.add_lemma(name, statement, namespace, relevance)
        return f"Lemme ajoute: {lemma_id} ({name})"

    @kernel_function(
        description="Enregistre une tentative de tactique avec son niveau de confiance",
        name="log_tactic_attempt"
    )
    def log_tactic_attempt(
        self, tactic: str, state_before: str, confidence: float = 0.5, explanation: str = ""
    ) -> str:
        """Enregistre une tactique tentee par TacticAgent."""
        attempt_id = self._state.add_tactic_attempt(tactic, state_before, confidence, explanation)
        return f"Tactique enregistree: {attempt_id}"

    @kernel_function(
        description="Enregistre le resultat d'une verification Lean",
        name="add_verification_result"
    )
    def add_verification_result(
        self, attempt_id: str, success: bool, output: str, errors: str,
        remaining_goals: str = "", exec_time_ms: float = 0.0
    ) -> str:
        """Enregistre un resultat de verification."""
        verif_id = self._state.add_verification(
            attempt_id, success, output, errors,
            remaining_goals if remaining_goals else None, exec_time_ms, "subprocess"
        )
        status = "OK" if success else "ECHEC"
        return f"Verification {verif_id}: {status}"

    @kernel_function(
        description="Designe l'agent qui doit parler au prochain tour. IMPORTANT: utiliser le nom exact.",
        name="designate_next_agent"
    )
    def designate_next_agent(self, agent_name: str) -> str:
        """Delegue au prochain agent."""
        valid_agents = ["SearchAgent", "TacticAgent", "VerifierAgent", "CriticAgent", "CoordinatorAgent"]
        if agent_name not in valid_agents:
            return f"ERREUR: Agent invalide '{agent_name}'. Valides: {valid_agents}"
        self._state.designate_next_agent(agent_name)
        return f"Prochain agent: {agent_name}"

    @kernel_function(
        description="Marque la preuve comme terminee avec le code final",
        name="set_proof_complete"
    )
    def set_proof_complete(self, proof_code: str) -> str:
        """Marque la preuve comme reussie."""
        self._state.set_proof_complete(proof_code)
        return f"PREUVE COMPLETE! Code: {proof_code[:100]}..."

    @kernel_function(
        description="Change la strategie de preuve (exploration, refinement, validation, recovery)",
        name="set_proof_strategy"
    )
    def set_proof_strategy(self, strategy: str) -> str:
        """Change la strategie de preuve."""
        try:
            self._state.set_strategy(ProofStrategy(strategy))
            return f"Strategie changee: {strategy}"
        except ValueError:
            return f"ERREUR: Strategie invalide '{strategy}'. Valides: exploration, refinement, validation, recovery"


# =============================================================================
# 8.3 LeanSearchPlugin
# =============================================================================

class LeanSearchPlugin:
    """
    Plugin pour la recherche de lemmes dans Mathlib.
    Utilise des patterns connus + verification #check via lean_runner.
    """

    def __init__(self, runner: LeanRunner):
        self._runner = runner
        # Base de lemmes connus (extensible)
        self._known_lemmas = {
            # Arithmetique de base
            "Nat.add_zero": ("n + 0 = n", "Nat"),
            "Nat.zero_add": ("0 + n = n", "Nat"),
            "Nat.add_comm": ("n + m = m + n", "Nat"),
            "Nat.add_assoc": ("(n + m) + k = n + (m + k)", "Nat"),
            "Nat.mul_one": ("n * 1 = n", "Nat"),
            "Nat.one_mul": ("1 * n = n", "Nat"),
            "Nat.mul_comm": ("n * m = m * n", "Nat"),
            "Nat.mul_assoc": ("(n * m) * k = n * (m * k)", "Nat"),
            "Nat.left_distrib": ("n * (m + k) = n * m + n * k", "Nat"),
            "Nat.right_distrib": ("(n + m) * k = n * k + m * k", "Nat"),
            # Logique
            "And.intro": ("a -> b -> a /\\ b", "Logic"),
            "And.left": ("a /\\ b -> a", "Logic"),
            "And.right": ("a /\\ b -> b", "Logic"),
            "Or.inl": ("a -> a \\/ b", "Logic"),
            "Or.inr": ("b -> a \\/ b", "Logic"),
            "Eq.refl": ("a = a", "Logic"),
            "Eq.symm": ("a = b -> b = a", "Logic"),
            "Eq.trans": ("a = b -> b = c -> a = c", "Logic"),
        }

    @kernel_function(
        description="Recherche des lemmes Mathlib pertinents pour un but donne",
        name="search_mathlib_lemmas"
    )
    def search_mathlib_lemmas(self, goal: str, max_results: int = 10) -> str:
        """
        Recherche des lemmes par mots-cles.

        Args:
            goal: Description du but ou mots-cles (ex: "addition commutative")
            max_results: Nombre maximum de resultats

        Returns:
            JSON avec les lemmes trouves
        """
        goal_lower = goal.lower()
        results = []

        # Recherche par mots-cles
        keywords = goal_lower.replace("+", "add").replace("*", "mul").replace("=", "eq").split()

        for name, (statement, namespace) in self._known_lemmas.items():
            score = 0.0
            name_lower = name.lower()

            # Scoring par mots-cles
            for kw in keywords:
                if kw in name_lower:
                    score += 0.3
                if kw in statement.lower():
                    score += 0.2

            # Patterns specifiques
            if "comm" in goal_lower and "comm" in name_lower:
                score += 0.4
            if "assoc" in goal_lower and "assoc" in name_lower:
                score += 0.4
            if "zero" in goal_lower and "zero" in name_lower:
                score += 0.3
            if "distrib" in goal_lower and "distrib" in name_lower:
                score += 0.4

            if score > 0:
                results.append({
                    "name": name,
                    "statement": statement,
                    "namespace": namespace,
                    "relevance": min(score, 1.0)
                })

        # Trier par pertinence
        results.sort(key=lambda x: x["relevance"], reverse=True)
        return json.dumps(results[:max_results], indent=2, ensure_ascii=False)

    @kernel_function(
        description="Verifie qu'un lemme existe et retourne son type via #check",
        name="check_lemma_type"
    )
    def check_lemma_type(self, lemma_name: str) -> str:
        """
        Verifie l'existence d'un lemme via #check.

        Args:
            lemma_name: Nom du lemme (ex: "Nat.add_comm")

        Returns:
            JSON {exists, type, error}
        """
        code = f"#check {lemma_name}"
        result = self._runner.run(code)

        if result.success and not result.errors:
            # Extraire le type de la sortie
            return json.dumps({
                "exists": True,
                "type": result.output.strip(),
                "error": None
            })
        else:
            return json.dumps({
                "exists": False,
                "type": None,
                "error": result.errors or "Lemme non trouve"
            })


# =============================================================================
# 8.4 LeanTacticPlugin
# =============================================================================

class LeanTacticPlugin:
    """
    Plugin pour la generation de tactiques.
    Fournit des heuristiques et analyse les echecs.
    """

    def __init__(self):
        # Tactiques par difficulte
        self._tactics = {
            "simple": ["rfl", "trivial", "exact ?_", "assumption"],
            "medium": ["simp", "omega", "decide", "constructor", "intro", "apply"],
            "complex": ["ring", "linarith", "aesop", "induction", "cases", "rcases"]
        }

        # Heuristiques par pattern de but
        self._heuristics = {
            "equality": ["rfl", "exact", "simp", "ring", "omega"],
            "forall": ["intro", "intros", "apply"],
            "exists": ["use", "exists", "exact"],
            "and": ["constructor", "exact And.intro"],
            "or": ["left", "right"],
            "implication": ["intro", "apply", "exact"],
            "nat_arithmetic": ["omega", "simp", "decide"],
            "ring_expression": ["ring", "ring_nf"]
        }

    @kernel_function(
        description="Genere des tactiques appropriees pour un but donne",
        name="generate_tactics"
    )
    def generate_tactics(self, goal: str, context: str = "", difficulty: str = "simple") -> str:
        """
        Genere des tactiques pour le but courant.

        Args:
            goal: Le but Lean a prouver
            context: Contexte additionnel (lemmes disponibles, etc.)
            difficulty: simple, medium, ou complex

        Returns:
            JSON [{tactic, confidence, explanation}]
        """
        suggestions = []
        goal_lower = goal.lower()

        # Detecter le type de but
        detected_patterns = []
        if "=" in goal:
            detected_patterns.append("equality")
        if "forall" in goal_lower or "∀" in goal:
            detected_patterns.append("forall")
        if "exists" in goal_lower or "∃" in goal:
            detected_patterns.append("exists")
        if "/\\" in goal or "∧" in goal or "And" in goal:
            detected_patterns.append("and")
        if "\\/" in goal or "∨" in goal or "Or" in goal:
            detected_patterns.append("or")
        if "->" in goal or "→" in goal:
            detected_patterns.append("implication")
        if any(x in goal_lower for x in ["nat", "n +", "m +", "+ 0", "0 +"]):
            detected_patterns.append("nat_arithmetic")
        if any(x in goal for x in ["*", "+"]) and "=" in goal:
            detected_patterns.append("ring_expression")

        # Collecter les tactiques suggeres
        seen = set()
        for pattern in detected_patterns:
            for tactic in self._heuristics.get(pattern, []):
                if tactic not in seen:
                    seen.add(tactic)
                    confidence = 0.7 if difficulty == "simple" else 0.5
                    suggestions.append({
                        "tactic": tactic,
                        "confidence": confidence,
                        "explanation": f"Pattern detecte: {pattern}"
                    })

        # Ajouter des tactiques de base
        base_tactics = self._tactics.get(difficulty, self._tactics["simple"])
        for tactic in base_tactics[:3]:
            if tactic not in seen:
                suggestions.append({
                    "tactic": tactic,
                    "confidence": 0.3,
                    "explanation": f"Tactique {difficulty} generique"
                })

        # Trier par confiance
        suggestions.sort(key=lambda x: x["confidence"], reverse=True)
        return json.dumps(suggestions[:8], indent=2, ensure_ascii=False)

    @kernel_function(
        description="Analyse un echec de tactique et suggere des alternatives",
        name="analyze_tactic_failure"
    )
    def analyze_tactic_failure(self, failed_tactic: str, error_msg: str) -> str:
        """
        Analyse pourquoi une tactique a echoue.

        Args:
            failed_tactic: La tactique qui a echoue
            error_msg: Message d'erreur Lean

        Returns:
            JSON {diagnosis, alternatives, error_type}
        """
        error_lower = error_msg.lower()
        diagnosis = ""
        alternatives = []
        error_type = "unknown"

        # Classifier l'erreur
        if "unknown identifier" in error_lower or "unknown constant" in error_lower:
            error_type = "unknown_identifier"
            diagnosis = "Lemme ou identifiant non reconnu. Verifier l'import ou le nom."
            alternatives = ["Chercher le bon nom avec #check", "Verifier les imports"]

        elif "type mismatch" in error_lower:
            error_type = "type_mismatch"
            diagnosis = "Les types ne correspondent pas. Verifier les arguments."
            alternatives = ["exact", "apply", "simp"]

        elif "unsolved goals" in error_lower or "goals remain" in error_lower:
            error_type = "unsolved_goals"
            diagnosis = "Des sous-buts restent. La tactique n'a pas complete la preuve."
            alternatives = ["Ajouter d'autres tactiques", "Essayer simp", "Decomposer avec have"]

        elif "tactic failed" in error_lower:
            error_type = "tactic_failed"
            diagnosis = f"La tactique '{failed_tactic}' n'a pas pu s'appliquer."
            # Suggerer des alternatives
            if failed_tactic in ["ring", "linarith"]:
                alternatives = ["omega", "simp", "decide"]
            elif failed_tactic == "simp":
                alternatives = ["simp only", "rfl", "exact"]
            else:
                alternatives = ["simp", "omega", "exact ?_"]

        elif "declaration uses 'sorry'" in error_lower:
            error_type = "sorry"
            diagnosis = "La preuve contient 'sorry' - incomplete."
            alternatives = ["Completer la preuve", "Remplacer sorry par une vraie tactique"]

        else:
            error_type = "other"
            diagnosis = f"Erreur non classifiee: {error_msg[:100]}"
            alternatives = ["Verifier la syntaxe", "Essayer une approche differente"]

        return json.dumps({
            "diagnosis": diagnosis,
            "alternatives": alternatives,
            "error_type": error_type,
            "original_error": error_msg[:200]
        }, indent=2, ensure_ascii=False)


# =============================================================================
# 8.5 LeanVerificationPlugin
# =============================================================================

class LeanVerificationPlugin:
    """
    Plugin pour la verification des preuves avec lean_runner.
    """

    def __init__(self, runner: LeanRunner):
        self._runner = runner

    @kernel_function(
        description="Verifie une preuve complete (theoreme + tactiques)",
        name="verify_proof"
    )
    def verify_proof(self, theorem_statement: str, proof_tactics: str) -> str:
        """
        Verifie un theoreme avec sa preuve.

        Args:
            theorem_statement: L'enonce du theoreme (ex: "theorem add_zero (n : Nat) : n + 0 = n")
            proof_tactics: La preuve (ex: "exact Nat.add_zero n")

        Returns:
            JSON {success, output, errors, exec_time_ms, backend}
        """
        import time

        # Construire le code complet
        if "by" not in proof_tactics and ":=" not in proof_tactics:
            code = f"{theorem_statement} := by {proof_tactics}"
        elif ":=" in proof_tactics:
            code = f"{theorem_statement} {proof_tactics}"
        else:
            code = f"{theorem_statement} := {proof_tactics}"

        start = time.time()
        result = self._runner.run(code)
        exec_time = (time.time() - start) * 1000

        return json.dumps({
            "success": result.success,
            "output": result.output,
            "errors": result.errors,
            "exit_code": result.exit_code,
            "exec_time_ms": round(exec_time, 2),
            "backend": result.backend,
            "code": code
        }, indent=2, ensure_ascii=False)

    @kernel_function(
        description="Verifie une etape de tactique incrementale",
        name="verify_tactic_step"
    )
    def verify_tactic_step(
        self, partial_proof: str, next_tactic: str, theorem_statement: str
    ) -> str:
        """
        Verifie une tactique incrementale.

        Args:
            partial_proof: Les tactiques deja appliquees (separees par ;)
            next_tactic: La prochaine tactique a essayer
            theorem_statement: L'enonce du theoreme

        Returns:
            JSON {tactic_valid, remaining_goals, error, exec_time_ms}
        """
        import time

        # Combiner les tactiques
        if partial_proof:
            all_tactics = f"{partial_proof}; {next_tactic}"
        else:
            all_tactics = next_tactic

        code = f"{theorem_statement} := by {all_tactics}"

        start = time.time()
        result = self._runner.run(code)
        exec_time = (time.time() - start) * 1000

        # Analyser les goals restants
        remaining_goals = None
        if "unsolved goals" in result.errors.lower():
            # Extraire les goals du message d'erreur
            remaining_goals = result.errors

        return json.dumps({
            "tactic_valid": result.success or "unsolved goals" not in result.errors.lower(),
            "remaining_goals": remaining_goals,
            "error": result.errors if not result.success else None,
            "exec_time_ms": round(exec_time, 2),
            "applied_tactics": all_tactics
        }, indent=2, ensure_ascii=False)


# =============================================================================
# Test des Plugins
# =============================================================================

print("\n=== Test des Plugins ===")

# Creer l'etat et le runner
test_state = ProofState(theorem_statement="theorem test_add (n : Nat) : n + 0 = n")
runner = LeanRunner(backend="subprocess", timeout=30)

# Instancier les plugins
state_plugin = ProofStateManagerPlugin(test_state)
search_plugin = LeanSearchPlugin(runner)
tactic_plugin = LeanTacticPlugin()
verif_plugin = LeanVerificationPlugin(runner)

# Test 1: Recherche de lemmes
print("\n1. Recherche de lemmes pour 'addition zero':")
lemmas = search_plugin.search_mathlib_lemmas("addition zero", max_results=3)
print(lemmas)

# Test 2: Generation de tactiques
print("\n2. Tactiques pour 'n + 0 = n':")
tactics = tactic_plugin.generate_tactics("n + 0 = n", difficulty="simple")
print(tactics)

# Test 3: Verification avec lean_runner
print("\n3. Verification d'une preuve:")
result = verif_plugin.verify_proof("theorem test_rfl : 2 + 2 = 4", "rfl")
print(result)

# Test 4: Plugin StateManager
print("\n4. Ajout via StateManagerPlugin:")
print(state_plugin.add_discovered_lemma("Nat.add_zero", "n + 0 = n", "Nat", 0.9))
print(state_plugin.get_proof_state(summarize=True))


Test du plugin:
Tache 'task_1' ajoutee: Trouver lemmes pertinents
Lemmes trouves: ['Nat.add_zero', 'Nat.zero_add', 'Nat.add_comm']
Tactiques candidates: ['exact Nat.add_zero', 'rw [Nat.add_zero]', 'exact Nat.zero_add', 'rw [Nat.zero_add]', 'exact Nat.add_comm']
Theoreme: theorem add_zero (n : Nat) : n + 0 = n
Taches: 1
Lemmes trouves: 3
Tactiques tentees: 0
Iterations: 0



### 8.6 Definition des 5 Agents Specialises

Le systeme multi-agents comprend 5 roles distincts:

| Agent | Role | Plugins | Delegation |
|-------|------|---------|------------|
| **SearchAgent** | Recherche lemmes Mathlib | LeanSearch, StateManager | TacticAgent si lemmes trouves |
| **TacticAgent** | Generation tactiques | LeanTactic, StateManager | VerifierAgent pour validation |
| **VerifierAgent** | Verification Lean | LeanVerification, StateManager | CriticAgent si echec |
| **CriticAgent** | Analyse echecs | LeanTactic, StateManager | Redirection selon erreur |
| **CoordinatorAgent** | Supervision globale | StateManager | Gestion des blocages |

**Pattern cle**: Chaque agent designe explicitement le suivant via `designate_next_agent()`.

In [7]:
# =============================================================================
# Section 8.6 - Definition des 5 Agents Specialises
# =============================================================================
# Pattern inspire de Argument_Analysis_Agentic-3-orchestration.ipynb

# --- Instructions des Agents ---

SEARCH_AGENT_INSTRUCTIONS = """
Tu es l'agent de RECHERCHE de lemmes pour le theorem proving en Lean 4.

TON ROLE UNIQUE:
- Chercher des lemmes Mathlib pertinents pour le theoreme courant
- Identifier les lemmes qui peuvent aider a la preuve
- Enregistrer les lemmes trouves dans l'etat partage

WORKFLOW:
1. Lis l'etat avec get_proof_state() pour comprendre le theoreme
2. Utilise search_mathlib_lemmas() avec des mots-cles pertinents
3. Verifie les lemmes prometteurs avec check_lemma_type()
4. Enregistre les lemmes utiles avec add_discovered_lemma()
5. Delegue a TacticAgent quand tu as trouve des lemmes

OUTILS DISPONIBLES:
- get_proof_state: Lire l'etat actuel
- search_mathlib_lemmas: Chercher des lemmes
- check_lemma_type: Verifier un lemme
- add_discovered_lemma: Enregistrer un lemme
- designate_next_agent: Deleguer au prochain agent

IMPORTANT:
- Cherche des lemmes LIES au but (egalites, arithmetique, logique)
- Delegation: Apres avoir trouve au moins 2-3 lemmes, delegue a TacticAgent
- Si aucun lemme pertinent, delegue quand meme a TacticAgent
"""

TACTIC_AGENT_INSTRUCTIONS = """
Tu es l'agent de GENERATION DE TACTIQUES pour le theorem proving en Lean 4.

TON ROLE UNIQUE:
- Generer des sequences de tactiques Lean pour prouver le but
- Utiliser les lemmes trouves par SearchAgent
- Proposer des tactiques avec niveau de confiance

WORKFLOW:
1. Lis l'etat avec get_proof_state() pour voir le theoreme et les lemmes
2. Utilise generate_tactics() pour obtenir des suggestions
3. Enregistre ta meilleure tentative avec log_tactic_attempt()
4. Delegue a VerifierAgent pour verification

STRATEGIES DE TACTIQUES (par difficulte):
- SIMPLE: rfl, trivial, exact, assumption
- MEDIUM: simp, omega, constructor, intro, apply
- COMPLEX: ring, linarith, induction, cases

OUTILS DISPONIBLES:
- get_proof_state: Lire l'etat (theoreme, lemmes, echecs precedents)
- generate_tactics: Obtenir des suggestions de tactiques
- log_tactic_attempt: Enregistrer une tentative
- designate_next_agent: Deleguer a VerifierAgent

IMPORTANT:
- Commence par les tactiques simples (rfl, exact)
- Utilise les lemmes trouves par SearchAgent (exact Nat.add_zero n)
- Delegation: TOUJOURS deleguer a VerifierAgent apres avoir propose
"""

VERIFIER_AGENT_INSTRUCTIONS = """
Tu es l'agent de VERIFICATION pour le theorem proving en Lean 4.

TON ROLE UNIQUE:
- Verifier les tactiques proposees avec le compilateur Lean
- Enregistrer les resultats de verification
- Determiner si la preuve est complete ou s'il faut continuer

WORKFLOW:
1. Lis l'etat avec get_proof_state() pour voir la derniere tactique
2. Utilise verify_proof() pour tester la preuve
3. Enregistre le resultat avec add_verification_result()
4. Si succes: set_proof_complete() et termine
5. Si echec: delegue a CriticAgent pour analyse

OUTILS DISPONIBLES:
- get_proof_state: Lire l'etat (theoreme, tactiques tentees)
- verify_proof: Verifier une preuve complete
- verify_tactic_step: Verifier une tactique incrementale
- add_verification_result: Enregistrer le resultat
- set_proof_complete: Marquer la preuve comme terminee
- designate_next_agent: Deleguer a CriticAgent si echec

IMPORTANT:
- Teste TOUJOURS la derniere tactique proposee
- Si la preuve compile sans erreur, utilise set_proof_complete()
- Si echec, enregistre l'erreur et delegue a CriticAgent
"""

CRITIC_AGENT_INSTRUCTIONS = """
Tu es l'agent CRITIQUE pour le theorem proving en Lean 4.

TON ROLE UNIQUE:
- Analyser les echecs de verification
- Diagnostiquer les erreurs Lean
- Orienter vers la bonne strategie de correction

WORKFLOW:
1. Lis l'etat avec get_proof_state() pour voir les echecs recents
2. Utilise analyze_tactic_failure() pour comprendre l'erreur
3. Decide quelle direction prendre:
   - "unknown identifier" -> delegue a SearchAgent
   - "type mismatch" ou "tactic failed" -> delegue a TacticAgent (difficulte superieure)
   - Echecs repetes (>3) -> delegue a CoordinatorAgent

PATTERNS D'ERREURS:
- "unknown identifier/constant" : Lemme non trouve -> SearchAgent
- "type mismatch" : Arguments incorrects -> TacticAgent
- "unsolved goals" : Preuve incomplete -> TacticAgent
- "tactic failed" : Mauvaise tactique -> TacticAgent

OUTILS DISPONIBLES:
- get_proof_state: Lire l'etat (echecs, iterations)
- analyze_tactic_failure: Diagnostiquer une erreur
- set_proof_strategy: Changer la strategie (recovery, refinement)
- designate_next_agent: Orienter selon le diagnostic

IMPORTANT:
- Analyse les 3 derniers echecs pour detecter des patterns
- Si >3 echecs similaires, delegue a CoordinatorAgent
- Suggere d'augmenter la difficulte des tactiques si necessaire
"""

COORDINATOR_AGENT_INSTRUCTIONS = """
Tu es l'agent COORDINATEUR (superviseur) pour le theorem proving en Lean 4.

TON ROLE UNIQUE:
- Superviser l'ensemble de la session de preuve
- Debloquer les situations cycliques
- Ajuster la strategie globale

QUAND TU INTERVIENS:
- Appele par CriticAgent apres echecs repetes
- Appele si max_iterations approche
- Appele pour decisions strategiques majeures

STRATEGIES DISPONIBLES:
- "exploration": Phase initiale, recherche de lemmes (defaut)
- "refinement": Affiner les tactiques apres premiers succes
- "validation": Phase finale, verification complete
- "recovery": Mode recuperation apres echecs majeurs

WORKFLOW:
1. Lis l'etat avec get_proof_state(summarize=False) pour tout voir
2. Analyse l'historique des echecs et des tactiques
3. Decide de la prochaine action:
   - Trop de recherches sans resultat? -> TacticAgent avec difficulte "complex"
   - Boucle detectee? -> Changer de strategie + SearchAgent
   - Proche du succes? -> VerifierAgent directement

OUTILS DISPONIBLES:
- get_proof_state: Vue complete de l'etat
- set_proof_strategy: Changer la strategie globale
- designate_next_agent: Orienter vers le bon agent

IMPORTANT:
- Tu es le dernier recours, prends des decisions audacieuses
- Si >40 iterations, suggere de simplifier le theoreme
- Garde une vue d'ensemble, pas juste le dernier echec
"""

# =============================================================================
# Classe SimpleAgent (simulation sans Semantic Kernel reel)
# =============================================================================

class SimpleAgent:
    """
    Agent simplifie pour simulation.
    En production, utiliser ChatCompletionAgent de Semantic Kernel.
    """

    def __init__(
        self,
        name: str,
        instructions: str,
        plugins: Dict[str, Any],
        use_simulation: bool = True
    ):
        self.name = name
        self.instructions = instructions
        self.plugins = plugins
        self.use_simulation = use_simulation
        self._openai_client = None

        # Initialiser le client OpenAI si disponible
        if not use_simulation:
            try:
                from openai import OpenAI
                api_key = os.getenv("OPENAI_API_KEY")
                if api_key and len(api_key) > 10 and not api_key.startswith("sk-..."):
                    self._openai_client = OpenAI(api_key=api_key)
            except ImportError:
                pass

    def _build_tool_descriptions(self) -> str:
        """Construit la description des outils disponibles."""
        tools = []
        for plugin_name, plugin in self.plugins.items():
            for attr_name in dir(plugin):
                attr = getattr(plugin, attr_name)
                if callable(attr) and hasattr(attr, '_sk_function'):
                    tools.append(f"- {attr._sk_name}: {attr._sk_description}")
        return "\n".join(tools)

    def invoke(self, message: str, state: ProofState) -> str:
        """Execute l'agent sur un message."""
        state.increment_iteration()

        if self.use_simulation or not self._openai_client:
            return self._simulate_response(message, state)
        else:
            return self._call_llm(message, state)

    def _simulate_response(self, message: str, state: ProofState) -> str:
        """Simulation de l'agent (sans appels LLM)."""
        # Logique simulee par agent
        if self.name == "SearchAgent":
            # Simuler recherche de lemmes
            search = self.plugins.get("search")
            state_mgr = self.plugins.get("state")
            if search and state_mgr:
                # Rechercher des lemmes
                lemmas_json = search.search_mathlib_lemmas(state.theorem_goal or "addition", max_results=3)
                lemmas = json.loads(lemmas_json)
                for lemma in lemmas[:2]:
                    state_mgr.add_discovered_lemma(lemma["name"], lemma["statement"], lemma["namespace"], lemma["relevance"])
                state_mgr.designate_next_agent("TacticAgent")
                return f"[SearchAgent] Trouves {len(lemmas[:2])} lemmes. Delegation a TacticAgent."

        elif self.name == "TacticAgent":
            # Simuler generation de tactiques
            tactic = self.plugins.get("tactic")
            state_mgr = self.plugins.get("state")
            if tactic and state_mgr:
                tactics_json = tactic.generate_tactics(state.theorem_goal or "n + 0 = n", difficulty="simple")
                tactics = json.loads(tactics_json)
                if tactics:
                    best = tactics[0]
                    state_mgr.log_tactic_attempt(best["tactic"], state.theorem_goal or "", best["confidence"], best["explanation"])
                state_mgr.designate_next_agent("VerifierAgent")
                return f"[TacticAgent] Propose: {tactics[0]['tactic'] if tactics else 'rfl'}. Delegation a VerifierAgent."

        elif self.name == "VerifierAgent":
            # Simuler verification
            verif = self.plugins.get("verification")
            state_mgr = self.plugins.get("state")
            if verif and state_mgr and state.tactics_history:
                last_tactic = state.tactics_history[-1]
                result_json = verif.verify_proof(state.theorem_statement, last_tactic.tactic)
                result = json.loads(result_json)
                state_mgr.add_verification_result(
                    last_tactic.id, result["success"], result["output"], result["errors"],
                    "", result["exec_time_ms"]
                )
                if result["success"]:
                    state_mgr.set_proof_complete(last_tactic.tactic)
                    return f"[VerifierAgent] SUCCES! Preuve: {last_tactic.tactic}"
                else:
                    state_mgr.designate_next_agent("CriticAgent")
                    return f"[VerifierAgent] Echec: {result['errors'][:100]}. Delegation a CriticAgent."

        elif self.name == "CriticAgent":
            # Simuler analyse
            tactic = self.plugins.get("tactic")
            state_mgr = self.plugins.get("state")
            if tactic and state_mgr:
                failures = state.get_recent_failures(3)
                if failures:
                    _, last_verif = failures[0]
                    analysis_json = tactic.analyze_tactic_failure("unknown", last_verif.errors)
                    analysis = json.loads(analysis_json)
                    if analysis["error_type"] == "unknown_identifier":
                        state_mgr.designate_next_agent("SearchAgent")
                        return f"[CriticAgent] Identifiant inconnu. Retour a SearchAgent."
                    elif len(failures) > 3:
                        state_mgr.designate_next_agent("CoordinatorAgent")
                        return f"[CriticAgent] Trop d'echecs ({len(failures)}). Appel CoordinatorAgent."
                    else:
                        state_mgr.designate_next_agent("TacticAgent")
                        return f"[CriticAgent] Essayer d'autres tactiques. -> TacticAgent."
                state_mgr.designate_next_agent("TacticAgent")
                return "[CriticAgent] Pas d'echecs recents. -> TacticAgent."

        elif self.name == "CoordinatorAgent":
            state_mgr = self.plugins.get("state")
            if state_mgr:
                if state.iteration_count > 30:
                    state_mgr.set_proof_strategy("recovery")
                    return "[CoordinatorAgent] Mode recovery active. Theoreme peut-etre trop complexe."
                else:
                    state_mgr.set_proof_strategy("refinement")
                    state_mgr.designate_next_agent("TacticAgent")
                    return "[CoordinatorAgent] Strategie refinement. -> TacticAgent avec difficulte superieure."

        return f"[{self.name}] Action simulee."

    def _call_llm(self, message: str, state: ProofState) -> str:
        """Appelle le LLM OpenAI."""
        # Note: En production, utiliser Semantic Kernel avec function calling
        tools_desc = self._build_tool_descriptions()
        state_summary = json.dumps(state.get_state_snapshot(summarize=True), indent=2)

        prompt = f"""
{self.instructions}

OUTILS DISPONIBLES:
{tools_desc}

ETAT ACTUEL:
{state_summary}

MESSAGE:
{message}

Reponds avec l'action a effectuer.
"""
        try:
            response = self._openai_client.chat.completions.create(
                model="gpt-4o-mini",
                messages=[
                    {"role": "system", "content": self.instructions},
                    {"role": "user", "content": f"Etat: {state_summary}\n\n{message}"}
                ],
                max_tokens=500,
                temperature=0.3
            )
            return f"[{self.name}] {response.choices[0].message.content}"
        except Exception as e:
            return f"[{self.name}] Erreur LLM: {e}"


# =============================================================================
# Test des Agents
# =============================================================================

print("\n=== Test des Agents ===")

# Creer l'environnement
test_state = ProofState(
    theorem_statement="theorem add_zero (n : Nat) : n + 0 = n",
    theorem_goal="n + 0 = n"
)
runner = LeanRunner(backend="subprocess", timeout=30)

# Creer les plugins
plugins = {
    "state": ProofStateManagerPlugin(test_state),
    "search": LeanSearchPlugin(runner),
    "tactic": LeanTacticPlugin(),
    "verification": LeanVerificationPlugin(runner)
}

# Creer les agents (mode simulation)
agents = {
    "SearchAgent": SimpleAgent("SearchAgent", SEARCH_AGENT_INSTRUCTIONS, plugins, use_simulation=True),
    "TacticAgent": SimpleAgent("TacticAgent", TACTIC_AGENT_INSTRUCTIONS, plugins, use_simulation=True),
    "VerifierAgent": SimpleAgent("VerifierAgent", VERIFIER_AGENT_INSTRUCTIONS, plugins, use_simulation=True),
    "CriticAgent": SimpleAgent("CriticAgent", CRITIC_AGENT_INSTRUCTIONS, plugins, use_simulation=True),
    "CoordinatorAgent": SimpleAgent("CoordinatorAgent", COORDINATOR_AGENT_INSTRUCTIONS, plugins, use_simulation=True),
}

# Tester un agent
print("\nTest SearchAgent:")
response = agents["SearchAgent"].invoke("Trouve des lemmes pour n + 0 = n", test_state)
print(response)
print(f"Etat apres SearchAgent:\n{test_state}")


python-dotenv non installe, utilisation des variables systeme
Agents definis:
  - OrchestratorAgent
  - SearchAgent
  - TacticAgent
  - VerifierAgent

Mode: Simulation (fallback)


### 8.7 Strategies d'Orchestration

L'orchestration determine comment les agents sont selectionnes et quand la conversation se termine.

**DelegatingSelectionStrategy** (Pattern recommande):
- Chaque agent designe explicitement le suivant via `designate_next_agent()`
- Si aucune designation, utilise un agent par defaut (CoordinatorAgent)

**ProofCompleteTermination**:
- Termine si `proof_complete == True`
- Termine si `iteration_count >= max_iterations`

### 8.8 Demonstration Complete

Cette demonstration montre le workflow multi-agents complet:
1. **CoordinatorAgent** initialise la session
2. **SearchAgent** recherche les lemmes pertinents
3. **TacticAgent** propose des tactiques
4. **VerifierAgent** verifie avec Lean
5. **CriticAgent** intervient en cas d'echec

In [8]:
# =============================================================================
# Section 8.7 - Strategies d'Orchestration
# =============================================================================

from abc import ABC, abstractmethod

class SelectionStrategy(ABC):
    """Strategie abstraite de selection de l'agent suivant."""

    @abstractmethod
    def select_next(self, agents: Dict[str, SimpleAgent], state: ProofState) -> str:
        """Retourne le nom de l'agent a activer."""
        pass


class DelegatingSelectionStrategy(SelectionStrategy):
    """
    Selection basee sur delegation explicite.

    L'agent courant designe le prochain via state.designate_next_agent().
    Si aucune designation, utilise l'agent par defaut.

    C'est le pattern recommande pour le theorem proving car il permet
    une orchestration flexible basee sur le contexte.
    """

    def __init__(self, default_agent: str = "CoordinatorAgent"):
        self.default_agent = default_agent

    def select_next(self, agents: Dict[str, SimpleAgent], state: ProofState) -> str:
        designated = state.consume_next_agent_designation()
        if designated and designated in agents:
            return designated
        return self.default_agent


class SequentialSelectionStrategy(SelectionStrategy):
    """
    Selection sequentielle simple.

    Suit un ordre predetermine: SearchAgent -> TacticAgent -> VerifierAgent.
    Utile pour les preuves simples qui ne necessitent pas de retour arriere.
    """

    def __init__(self, sequence: List[str] = None):
        self.sequence = sequence or ["SearchAgent", "TacticAgent", "VerifierAgent"]
        self.current_index = 0

    def select_next(self, agents: Dict[str, SimpleAgent], state: ProofState) -> str:
        agent = self.sequence[self.current_index % len(self.sequence)]
        self.current_index += 1
        return agent


class TerminationStrategy(ABC):
    """Strategie abstraite de terminaison."""

    @abstractmethod
    def should_terminate(self, state: ProofState) -> bool:
        """Retourne True si la conversation doit se terminer."""
        pass


class ProofCompleteTermination(TerminationStrategy):
    """
    Termine quand la preuve est trouvee ou max iterations atteint.
    """

    def __init__(self, max_iterations: int = 50):
        self.max_iterations = max_iterations

    def should_terminate(self, state: ProofState) -> bool:
        # Terminaison si preuve trouvee
        if state.proof_complete:
            return True
        # Terminaison si max iterations
        if state.iteration_count >= self.max_iterations:
            return True
        return False


# =============================================================================
# AgentGroupChat - Orchestration Multi-Agents
# =============================================================================

class AgentGroupChat:
    """
    Conversation multi-agents pour theorem proving.

    Coordonne plusieurs agents selon les strategies de selection et terminaison.
    Chaque tour:
    1. Selectionne l'agent suivant (selon SelectionStrategy)
    2. L'agent agit et modifie l'etat
    3. Verifie si terminaison (selon TerminationStrategy)
    """

    def __init__(
        self,
        agents: Dict[str, SimpleAgent],
        state: ProofState,
        selection_strategy: SelectionStrategy,
        termination_strategy: TerminationStrategy
    ):
        self.agents = agents
        self.state = state
        self.selection = selection_strategy
        self.termination = termination_strategy
        self.history: List[Dict[str, Any]] = []

    def run(self, initial_message: str, verbose: bool = True) -> str:
        """
        Execute la conversation multi-agents.

        Args:
            initial_message: Message initial (typiquement le theoreme)
            verbose: Afficher les logs de progression

        Returns:
            Resultat de la session (preuve ou message d'echec)
        """
        if verbose:
            print("=" * 60)
            print(f"Session demarree: {initial_message[:80]}...")
            print("=" * 60)

        current_message = initial_message

        while not self.termination.should_terminate(self.state):
            # 1. Selectionner l'agent
            agent_name = self.selection.select_next(self.agents, self.state)
            agent = self.agents.get(agent_name)

            if not agent:
                print(f"ERREUR: Agent '{agent_name}' non trouve!")
                break

            # 2. Executer l'agent
            if verbose:
                print(f"\n[Tour {self.state.iteration_count + 1}] {agent_name}")

            response = agent.invoke(current_message, self.state)

            # 3. Enregistrer dans l'historique
            self.history.append({
                "iteration": self.state.iteration_count,
                "agent": agent_name,
                "message": current_message[:100],
                "response": response[:200],
                "state_snapshot": self.state.get_state_snapshot(summarize=True)
            })

            if verbose:
                print(response)

            # 4. Preparer le prochain tour
            current_message = response

        # Resume final
        if verbose:
            print("\n" + "=" * 60)
            if self.state.proof_complete:
                print("SUCCES! Preuve trouvee:")
                print(self.state.final_proof)
            else:
                print(f"Session terminee apres {self.state.iteration_count} iterations.")
                print("Preuve non trouvee.")
            print("=" * 60)

        return self.state.final_proof or "Preuve non trouvee"

    def get_timeline(self) -> List[Dict[str, Any]]:
        """Retourne l'historique pour visualisation."""
        return self.history


# Test des strategies
print("=== Test des Strategies ===")

test_state = ProofState(
    theorem_statement="theorem test (n : Nat) : n = n",
    theorem_goal="n = n",
    max_iterations=5
)

selection = DelegatingSelectionStrategy(default_agent="CoordinatorAgent")
termination = ProofCompleteTermination(max_iterations=5)

print(f"Selection sans designation: {selection.select_next({}, test_state)}")
test_state.designate_next_agent("TacticAgent")
print(f"Selection avec designation: {selection.select_next({}, test_state)}")

print(f"Terminaison (iteration 0): {termination.should_terminate(test_state)}")
test_state.iteration_count = 10
print(f"Terminaison (iteration 10): {termination.should_terminate(test_state)}")


Strategies d'orchestration definies:
  - DelegatingSelectionStrategy: Selection par delegation explicite
  - RoundRobinStrategy: Selection cyclique
  - ProofCompleteTermination: Termine quand preuve trouvee


In [9]:
# =============================================================================
# Section 8.8 - Demonstration Complete
# =============================================================================

def prove_with_multi_agents(
    theorem: str,
    goal: str = "",
    max_iterations: int = 20,
    verbose: bool = True,
    use_simulation: bool = True
) -> Dict[str, Any]:
    """
    Prouve un theoreme en utilisant le systeme multi-agents.

    Args:
        theorem: L'enonce du theoreme complet
        goal: Le but a prouver (extrait du theoreme si non fourni)
        max_iterations: Nombre maximum d'iterations
        verbose: Afficher les logs
        use_simulation: Utiliser le mode simulation (sans appels LLM)

    Returns:
        Dict avec resultats et metriques
    """
    import time
    start_time = time.time()

    # 1. Creer l'etat
    if not goal:
        # Extraire le but du theoreme
        if ":" in theorem:
            goal = theorem.split(":")[-1].strip()

    state = ProofState(
        theorem_statement=theorem,
        theorem_goal=goal,
        max_iterations=max_iterations
    )

    # 2. Creer le runner Lean
    runner = LeanRunner(backend="subprocess", timeout=30)

    # 3. Creer les plugins
    plugins = {
        "state": ProofStateManagerPlugin(state),
        "search": LeanSearchPlugin(runner),
        "tactic": LeanTacticPlugin(),
        "verification": LeanVerificationPlugin(runner)
    }

    # 4. Creer les agents
    agents = {
        "SearchAgent": SimpleAgent("SearchAgent", SEARCH_AGENT_INSTRUCTIONS, plugins, use_simulation),
        "TacticAgent": SimpleAgent("TacticAgent", TACTIC_AGENT_INSTRUCTIONS, plugins, use_simulation),
        "VerifierAgent": SimpleAgent("VerifierAgent", VERIFIER_AGENT_INSTRUCTIONS, plugins, use_simulation),
        "CriticAgent": SimpleAgent("CriticAgent", CRITIC_AGENT_INSTRUCTIONS, plugins, use_simulation),
        "CoordinatorAgent": SimpleAgent("CoordinatorAgent", COORDINATOR_AGENT_INSTRUCTIONS, plugins, use_simulation),
    }

    # 5. Configurer les strategies
    selection = DelegatingSelectionStrategy(default_agent="SearchAgent")  # Commence par la recherche
    termination = ProofCompleteTermination(max_iterations=max_iterations)

    # 6. Creer le groupe de chat
    chat = AgentGroupChat(
        agents=agents,
        state=state,
        selection_strategy=selection,
        termination_strategy=termination
    )

    # 7. Executer
    result = chat.run(f"Prouver: {theorem}", verbose=verbose)

    # 8. Collecter les metriques
    elapsed = time.time() - start_time
    metrics = {
        "success": state.proof_complete,
        "theorem": theorem,
        "final_proof": state.final_proof,
        "iterations": state.iteration_count,
        "lemmas_discovered": len(state.discovered_lemmas),
        "tactics_tried": len(state.tactics_history),
        "verifications": len(state.verification_results),
        "total_time_s": round(elapsed, 2),
        "lean_time_ms": round(state.total_lean_time_ms, 2),
        "timeline": chat.get_timeline()
    }

    return metrics


# =============================================================================
# Exemples de Theoremes par Niveau
# =============================================================================

LEVEL_1_THEOREMS = [
    ("theorem test_rfl (n : Nat) : n = n", "n = n"),
    ("theorem test_trivial : True", "True"),
]

LEVEL_2_THEOREMS = [
    ("theorem add_zero (n : Nat) : n + 0 = n", "n + 0 = n"),
    ("theorem add_comm (n m : Nat) : n + m = m + n", "n + m = m + n"),
]

LEVEL_3_THEOREMS = [
    ("theorem add_assoc (a b c : Nat) : (a + b) + c = a + (b + c)", "(a + b) + c = a + (b + c)"),
]

# =============================================================================
# Demonstration
# =============================================================================

print("\n" + "=" * 70)
print("DEMONSTRATION DU SYSTEME MULTI-AGENTS POUR THEOREM PROVING")
print("=" * 70)

# Detecter le mode
api_key = os.getenv("OPENAI_API_KEY")
api_available = api_key and len(api_key) > 10 and not api_key.startswith("sk-...")
mode = "LLM" if api_available else "Simulation"
print(f"\nMode: {mode}")
print("-" * 70)

# Test sur un theoreme simple (Niveau 1)
print("\n### Niveau 1: Theoreme trivial ###")
theorem, goal = LEVEL_1_THEOREMS[0]
result = prove_with_multi_agents(
    theorem=theorem,
    goal=goal,
    max_iterations=10,
    verbose=True,
    use_simulation=not api_available
)

print(f"\nResultat: {'SUCCES' if result['success'] else 'ECHEC'}")
print(f"Iterations: {result['iterations']}")
print(f"Temps total: {result['total_time_s']}s")

# Test sur un theoreme moyen (Niveau 2)
print("\n\n### Niveau 2: Addition ###")
theorem, goal = LEVEL_2_THEOREMS[0]
result = prove_with_multi_agents(
    theorem=theorem,
    goal=goal,
    max_iterations=15,
    verbose=True,
    use_simulation=not api_available
)

print(f"\nResultat: {'SUCCES' if result['success'] else 'ECHEC'}")
print(f"Iterations: {result['iterations']}")
print(f"Lemmes decouverts: {result['lemmas_discovered']}")
print(f"Tactiques testees: {result['tactics_tried']}")


Mode d'execution: Simulation
------------------------------------------------------------

DEMARRAGE DE LA CONVERSATION MULTI-AGENTS
Objectif: Prouver: theorem add_zero (n : Nat) : n + 0 = n

--- Tour 1: OrchestratorAgent ---
(OrchestratorAgent) Je delegue a SearchAgent.
Theoreme: theorem add_zero (n : Nat) : n + 0 = n
Taches: 0
Lemmes trouves: 0
Tactiques tentees: 0
Iterations: 1


--- Tour 2: SearchAgent ---
(SearchAgent) Lemmes trouves: ['Nat.add_zero', 'Nat.zero_add', 'Nat.add_comm']

--- Tour 3: TacticAgent ---
(TacticAgent) Tactiques candidates: ['exact Nat.add_zero', 'rw [Nat.add_zero]', 'exact Nat.zero_add', 'rw [Nat.zero_add]', 'exact Nat.add_comm']

--- Tour 4: VerifierAgent ---
(VerifierAgent) SUCCES: Preuve verifiee!

SUCCES apres 4 iterations!
Preuve: exact Nat.add_zero


## 6. Techniques de Harmonic Aristotle

### 6.1 Decomposition de problemes

Aristotle decompose les problemes complexes en sous-problemes plus simples.

In [10]:
class AristotleDecomposer:
    """
    Decomposition de problemes a la Harmonic Aristotle.
    """
    
    def decompose(self, theorem: str) -> List[str]:
        """
        Decompose un theoreme en sous-lemmes.
        
        Strategy:
        1. Identifier la structure (conjonction, equivalence, etc.)
        2. Separer en composantes
        3. Identifier les dependances
        """
        subproblems = []
        
        # Decomposition basique par structure
        if "<->" in theorem or "iff" in theorem.lower():
            # Equivalence = deux implications
            parts = theorem.split("<->")
            subproblems.append(f"Direction 1: {parts[0]} -> {parts[1]}")
            subproblems.append(f"Direction 2: {parts[1]} -> {parts[0]}")
        
        elif "/\\" in theorem or "and" in theorem.lower():
            # Conjonction = prouver chaque partie
            parts = theorem.split("/\\")
            for i, part in enumerate(parts):
                subproblems.append(f"Partie {i+1}: {part.strip()}")
        
        elif "forall" in theorem.lower():
            # Universel = fixer variable, prouver pour arbitraire
            subproblems.append(f"Generalisation: introduire variable, prouver corps")
        
        elif "exists" in theorem.lower():
            # Existentiel = trouver temoin + preuve
            subproblems.append(f"Temoin: trouver valeur concrete")
            subproblems.append(f"Verification: prouver pour ce temoin")
        
        else:
            # Pas de decomposition evidente
            subproblems.append(theorem)
        
        return subproblems
    
    def solve_hierarchical(self, theorem: str, solver) -> Tuple[bool, str]:
        """
        Resolution hierarchique par decomposition.
        """
        subproblems = self.decompose(theorem)
        
        if len(subproblems) == 1 and subproblems[0] == theorem:
            # Cas de base: resoudre directement
            return solver(theorem)
        
        # Resoudre chaque sous-probleme
        solutions = []
        for sub in subproblems:
            success, proof = self.solve_hierarchical(sub, solver)
            if not success:
                return False, None
            solutions.append(proof)
        
        # Combiner les solutions
        combined = self._combine_proofs(solutions)
        return True, combined
    
    def _combine_proofs(self, proofs: List[str]) -> str:
        """Combine des preuves de sous-problemes."""
        return "\n".join([
            f"-- Partie {i+1}\n{proof}" 
            for i, proof in enumerate(proofs)
        ])

# Test
decomposer = AristotleDecomposer()
subproblems = decomposer.decompose("P <-> Q")
print("Decomposition de 'P <-> Q':")
for sp in subproblems:
    print(f"  - {sp}")

Decomposition de 'P <-> Q':
  - Direction 1: P  ->  Q
  - Direction 2:  Q -> P 


## 7. Benchmarking sur Problemes d'Erdos

Les problemes d'Erdos sont devenus le benchmark de reference pour evaluer les systemes de theorem proving automatique. Plusieurs ont ete resolus par IA en 2025-2026.

In [11]:
# Benchmark sur des problemes type Erdos (simplifies)

BENCHMARK_PROBLEMS = [
    {
        "id": "simple_1",
        "name": "Addition zero",
        "statement": "theorem add_zero (n : Nat) : n + 0 = n",
        "difficulty": 1,
        "expected_tactics": ["exact Nat.add_zero n", "rfl"]
    },
    {
        "id": "simple_2", 
        "name": "Commutativite addition",
        "statement": "theorem add_comm (a b : Nat) : a + b = b + a",
        "difficulty": 2,
        "expected_tactics": ["exact Nat.add_comm a b"]
    },
    {
        "id": "medium_1",
        "name": "Associativite addition",
        "statement": "theorem add_assoc (a b c : Nat) : (a + b) + c = a + (b + c)",
        "difficulty": 3,
        "expected_tactics": ["exact Nat.add_assoc a b c", "induction c"]
    },
]

def run_benchmark(solver, problems=BENCHMARK_PROBLEMS):
    """Execute le benchmark sur les problemes donnes."""
    results = []
    
    for problem in problems:
        print(f"\nTest: {problem['name']} (difficulte: {problem['difficulty']})")
        
        success, proof = solver.prove(problem['statement'])
        
        results.append({
            "id": problem["id"],
            "success": success,
            "proof": proof
        })
    
    # Statistiques
    total = len(results)
    solved = sum(1 for r in results if r["success"])
    
    print(f"\n{'='*60}")
    print(f"RESULTATS DU BENCHMARK")
    print(f"{'='*60}")
    print(f"Resolus: {solved}/{total} ({100*solved/total:.1f}%)")
    
    return results

# Executer le benchmark (limite a 3 iterations pour la demo)
orchestrator.max_iterations = 3
results = run_benchmark(orchestrator, BENCHMARK_PROBLEMS[:2])


Test: Addition zero (difficulte: 1)

Debut de la preuve: theorem add_zero (n : Nat) : n + 0 = n

--- Iteration 1 ---
Lemmes trouves: []
Tactiques generees: ['rfl']

Preuve trouvee!

Test: Commutativite addition (difficulte: 2)

Debut de la preuve: theorem add_comm (a b : Nat) : a + b = b + a

--- Iteration 1 ---
Lemmes trouves: []
Tactiques generees: ['rfl']

Preuve trouvee!

RESULTATS DU BENCHMARK
Resolus: 2/2 (100.0%)


## 8. Exercices

### Exercice 1 : Ameliorer l'agent de recherche

In [12]:
# Exercice 1 - SOLUTION: Agent de recherche ameliore avec scoring LLM

import os
import sys
from pathlib import Path

# Ajouter le repertoire courant au path
sys.path.insert(0, str(Path.cwd()))

# Utiliser load_env_file de lean_runner (evite les problemes d'introspection)
from lean_runner import load_env_file
env_path = Path.cwd() / ".env"
load_env_file(env_path)

class ImprovedSearchAgent(TheoremSearchAgent):
    """
    Version amelioree de l'agent de recherche avec scoring par LLM.
    
    Ameliorations:
    1. Scoring semantique par LLM (pertinence reelle, pas juste mots-cles)
    2. Cache des scores pour eviter les appels API redondants
    3. Fallback sur heuristique si API non disponible
    """
    
    def __init__(self, llm_client=None):
        super().__init__(llm_client)
        self.score_cache = {}  # (lemma_name, goal) -> score
        self.api_available = self._check_api()
    
    def _check_api(self) -> bool:
        """Verifie si l'API OpenAI est disponible."""
        api_key = os.getenv("OPENAI_API_KEY")
        return api_key is not None and not api_key.startswith("sk-...")
    
    def _score_with_llm(self, lemma: Lemma, goal: str) -> float:
        """
        Score la pertinence d'un lemme par rapport au but en utilisant un LLM.
        
        Returns:
            Score de pertinence entre 0.0 et 1.0
        """
        # Verifier le cache
        cache_key = (lemma.name, goal)
        if cache_key in self.score_cache:
            return self.score_cache[cache_key]
        
        # Si API non disponible, utiliser heuristique
        if not self.api_available:
            score = self._heuristic_score(lemma, goal)
            self.score_cache[cache_key] = score
            return score
        
        # Appel API reel
        try:
            from openai import OpenAI
            client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
            
            prompt = f"""Evalue la pertinence d'un lemme mathematique pour prouver un but en Lean 4.

Lemme: {lemma.name}
Enonce du lemme: {lemma.statement}

But a prouver: {goal}

Sur une echelle de 0 a 1, quelle est la pertinence de ce lemme?
- 1.0 = Le lemme resout directement le but
- 0.7-0.9 = Tres pertinent, peut etre utilise avec une reecriture
- 0.4-0.6 = Moderement pertinent, structure similaire
- 0.1-0.3 = Peu pertinent, meme domaine mais different
- 0.0 = Aucun rapport

Reponds UNIQUEMENT avec un nombre decimal entre 0 et 1."""

            # Les modeles modernes (gpt-4o, gpt-4.5, gpt-5, o1, o3) utilisent max_completion_tokens
            model = os.getenv("OPENAI_CHAT_MODEL_ID", "gpt-4o")
            use_max_completion_tokens = any(model.startswith(p) for p in ('gpt-4o', 'gpt-4.5', 'gpt-5', 'o1', 'o3'))
            token_param = {"max_completion_tokens": 10} if use_max_completion_tokens else {"max_tokens": 10}
            
            response = client.chat.completions.create(
                model=model,
                messages=[{"role": "user", "content": prompt}],
                temperature=0.1,
                **token_param
            )
            
            # Parser la reponse
            score_text = response.choices[0].message.content.strip()
            score = float(score_text)
            score = max(0.0, min(1.0, score))  # Clamp entre 0 et 1
            
        except Exception as e:
            print(f"  [Scoring LLM echoue: {e}, utilisation heuristique]")
            score = self._heuristic_score(lemma, goal)
        
        # Mettre en cache
        self.score_cache[cache_key] = score
        return score
    
    def _heuristic_score(self, lemma: Lemma, goal: str) -> float:
        """
        Score heuristique base sur la correspondance de termes.
        Utilise comme fallback quand l'API n'est pas disponible.
        """
        # Normaliser les chaines
        lemma_terms = set(lemma.statement.lower().replace(":", " ").split())
        goal_terms = set(goal.lower().replace(":", " ").split())
        
        # Score = Jaccard similarity
        intersection = len(lemma_terms & goal_terms)
        union = len(lemma_terms | goal_terms)
        
        if union == 0:
            return 0.0
        
        jaccard = intersection / union
        
        # Bonus si le nom du lemme correspond au type d'operation
        bonus = 0.0
        if "add" in lemma.name.lower() and "+" in goal:
            bonus = 0.2
        elif "mul" in lemma.name.lower() and "*" in goal:
            bonus = 0.2
        elif "comm" in lemma.name.lower() and ("comm" in goal.lower() or 
                                               ("+b" in goal.replace(" ", "") and "+a" in goal.replace(" ", ""))):
            bonus = 0.15
        
        return min(1.0, jaccard + bonus)
    
    def _score_lemmas(self, lemmas: List[Lemma], goal: str) -> List[Lemma]:
        """Score les lemmes avec la methode amelioree."""
        print(f"  Scoring {len(lemmas)} lemmes...")
        
        for lemma in lemmas:
            lemma.relevance_score = self._score_with_llm(lemma, goal)
        
        # Trier par pertinence decroissante
        return sorted(lemmas, key=lambda l: l.relevance_score, reverse=True)

# Test de l'agent ameliore
print("Test de ImprovedSearchAgent:")
print("-" * 40)

improved_agent = ImprovedSearchAgent()
goal = "n + 0 = n"
results = improved_agent.search(goal)

print(f"\nLemmes trouves pour '{goal}':")
for lemma in results:
    print(f"  [{lemma.relevance_score:.2f}] {lemma.name}: {lemma.statement}")

# Test sur un autre but
goal2 = "a + b = b + a"
results2 = improved_agent.search(goal2)
print(f"\nLemmes trouves pour '{goal2}':")
for lemma in results2:
    print(f"  [{lemma.relevance_score:.2f}] {lemma.name}: {lemma.statement}")

Test de ImprovedSearchAgent:
----------------------------------------
  Scoring 0 lemmes...

Lemmes trouves pour 'n + 0 = n':
  Scoring 0 lemmes...

Lemmes trouves pour 'a + b = b + a':


### Exercice 2 : Ajouter de la memoire

In [13]:
# Exercice 2 - SOLUTION: Systeme de memoire avec pattern matching

import re
import json
from typing import Dict, List, Optional, Tuple
from dataclasses import dataclass, field
from difflib import SequenceMatcher

@dataclass
class StoredProof:
    """Une preuve stockee avec son contexte."""
    theorem_pattern: str
    original_theorem: str
    proof: str
    success_count: int = 1
    variables: Dict[str, str] = field(default_factory=dict)

class ProofMemory:
    """
    Systeme de memoire pour reutiliser les preuves reussies.
    
    Fonctionnalites:
    1. Pattern matching pour generaliser les theoremes
    2. Recherche de preuves similaires par similarite
    3. Adaptation des preuves au nouveau contexte
    4. Persistence (optionnelle) vers fichier JSON
    """
    
    def __init__(self, similarity_threshold: float = 0.7):
        self.proofs: Dict[str, StoredProof] = {}  # pattern -> StoredProof
        self.similarity_threshold = similarity_threshold
    
    def store(self, theorem: str, proof: str) -> str:
        """
        Stocke une preuve reussie.
        
        Returns:
            L'ID du pattern utilise pour le stockage
        """
        # Extraire le pattern et les variables
        pattern, variables = self._extract_pattern(theorem)
        
        if pattern in self.proofs:
            # Incrementer le compteur de succes
            self.proofs[pattern].success_count += 1
        else:
            # Nouvelle preuve
            self.proofs[pattern] = StoredProof(
                theorem_pattern=pattern,
                original_theorem=theorem,
                proof=proof,
                variables=variables
            )
        
        return pattern
    
    def recall(self, theorem: str) -> Optional[Tuple[str, float]]:
        """
        Retrouve une preuve similaire.
        
        Returns:
            (preuve_adaptee, score_similarite) ou None si rien trouve
        """
        # Extraire le pattern du theoreme
        query_pattern, query_vars = self._extract_pattern(theorem)
        
        # Recherche exacte d'abord
        if query_pattern in self.proofs:
            stored = self.proofs[query_pattern]
            adapted_proof = self._adapt_proof(stored.proof, stored.variables, query_vars)
            return adapted_proof, 1.0
        
        # Recherche par similarite
        best_match = None
        best_score = 0.0
        
        for pattern, stored in self.proofs.items():
            score = self._similarity(query_pattern, pattern)
            if score > best_score and score >= self.similarity_threshold:
                best_score = score
                best_match = stored
        
        if best_match:
            adapted_proof = self._adapt_proof(best_match.proof, best_match.variables, query_vars)
            return adapted_proof, best_score
        
        return None
    
    def _extract_pattern(self, theorem: str) -> Tuple[str, Dict[str, str]]:
        """
        Extrait un pattern generalise du theoreme.
        
        Transformations:
        - Variables specifiques -> placeholders (?x, ?y, ?z)
        - Types conserves
        - Structure preservee
        
        Exemple:
            "theorem foo (n : Nat) : n + 0 = n" 
            -> "theorem ?name (?x : Nat) : ?x + 0 = ?x"
        """
        variables = {}
        pattern = theorem
        
        # Extraire le nom du theoreme
        name_match = re.search(r'theorem\s+(\w+)', theorem)
        if name_match:
            variables['theorem_name'] = name_match.group(1)
            pattern = re.sub(r'theorem\s+\w+', 'theorem ?name', pattern)
        
        # Extraire les variables de type Nat/Int
        var_matches = re.findall(r'\((\w+)\s*:\s*(\w+)\)', theorem)
        placeholder_index = 0
        placeholders = ['?x', '?y', '?z', '?a', '?b', '?c']
        
        for var_name, var_type in var_matches:
            if placeholder_index < len(placeholders):
                placeholder = placeholders[placeholder_index]
                variables[placeholder] = var_name
                # Remplacer la variable dans tout le pattern
                pattern = re.sub(rf'\b{var_name}\b', placeholder, pattern)
                placeholder_index += 1
        
        return pattern, variables
    
    def _similarity(self, pattern1: str, pattern2: str) -> float:
        """
        Calcule la similarite entre deux patterns.
        Utilise SequenceMatcher pour une comparaison robuste.
        """
        # Normaliser
        p1 = pattern1.lower().replace(" ", "")
        p2 = pattern2.lower().replace(" ", "")
        
        return SequenceMatcher(None, p1, p2).ratio()
    
    def _adapt_proof(self, proof: str, original_vars: Dict[str, str], 
                     new_vars: Dict[str, str]) -> str:
        """
        Adapte une preuve au nouveau contexte en substituant les variables.
        """
        adapted = proof
        
        for placeholder, orig_name in original_vars.items():
            if placeholder in new_vars:
                new_name = new_vars[placeholder]
                # Remplacer le nom original par le nouveau
                adapted = re.sub(rf'\b{orig_name}\b', new_name, adapted)
        
        return adapted
    
    def get_statistics(self) -> Dict:
        """Retourne des statistiques sur la memoire."""
        return {
            "total_patterns": len(self.proofs),
            "total_uses": sum(p.success_count for p in self.proofs.values()),
            "most_used": max(self.proofs.values(), 
                           key=lambda p: p.success_count).theorem_pattern 
                          if self.proofs else None
        }
    
    def save(self, filepath: str):
        """Sauvegarde la memoire dans un fichier JSON."""
        data = {
            pattern: {
                "theorem_pattern": sp.theorem_pattern,
                "original_theorem": sp.original_theorem,
                "proof": sp.proof,
                "success_count": sp.success_count,
                "variables": sp.variables
            }
            for pattern, sp in self.proofs.items()
        }
        with open(filepath, 'w') as f:
            json.dump(data, f, indent=2)
    
    def load(self, filepath: str):
        """Charge la memoire depuis un fichier JSON."""
        with open(filepath, 'r') as f:
            data = json.load(f)
        
        self.proofs = {
            pattern: StoredProof(**stored)
            for pattern, stored in data.items()
        }

# Test de ProofMemory
print("Test de ProofMemory:")
print("-" * 50)

memory = ProofMemory()

# Stocker quelques preuves
memory.store(
    "theorem add_zero_n (n : Nat) : n + 0 = n",
    "exact Nat.add_zero n"
)
memory.store(
    "theorem add_comm_ab (a b : Nat) : a + b = b + a",
    "exact Nat.add_comm a b"
)

print(f"Preuves stockees: {len(memory.proofs)}")

# Tester le recall sur un theoreme similaire
test_theorem = "theorem my_add_zero (m : Nat) : m + 0 = m"
result = memory.recall(test_theorem)

if result:
    proof, score = result
    print(f"\nRecall pour '{test_theorem}':")
    print(f"  Score de similarite: {score:.2f}")
    print(f"  Preuve adaptee: {proof}")
else:
    print(f"\nPas de preuve trouvee pour '{test_theorem}'")

# Statistiques
stats = memory.get_statistics()
print(f"\nStatistiques memoire:")
print(f"  Patterns: {stats['total_patterns']}")
print(f"  Utilisations totales: {stats['total_uses']}")

Test de ProofMemory:
--------------------------------------------------
Preuves stockees: 2

Recall pour 'theorem my_add_zero (m : Nat) : m + 0 = m':
  Score de similarite: 1.00
  Preuve adaptee: exact Nat.add_zero m

Statistiques memoire:
  Patterns: 2
  Utilisations totales: 2


## Resume

### Architecture multi-agents pour theorem proving

| Agent | Role | Entrees | Sorties |
|-------|------|---------|--------|
| **OrchestratorAgent** | Coordonner workflow | Theoreme | Delegation + status |
| **SearchAgent** | Trouver lemmes Mathlib | But | Liste de lemmes |
| **TacticAgent** | Generer tactiques | But + lemmes | Sequence de tactiques |
| **VerifierAgent** | Valider avec Lean | Code Lean | Succes/Erreur + feedback |

### Patterns Semantic Kernel implementes

| Pattern | Description | Classe |
|---------|-------------|--------|
| **StateManager** | Etat partage entre agents | `ProofState` |
| **Plugin** | Fonctions @kernel_function | `LeanProverPlugin` |
| **SelectionStrategy** | Choix agent suivant | `DelegatingSelectionStrategy` |
| **TerminationStrategy** | Critere d'arret | `ProofCompleteTermination` |
| **AgentGroupChat** | Conversation multi-agents | `AgentGroupChat` |

### Techniques cles

1. **Etat partage** : Tous les agents lisent/ecrivent dans `ProofState`
2. **Delegation explicite** : Chaque agent designe le suivant via `delegate_to_agent`
3. **Boucle de feedback** : Echecs envoyes a `TacticAgent` pour correction
4. **Memoire de session** : Historique des tentatives pour eviter repetitions
5. **Decomposition (Aristotle)** : Diviser problemes complexes en sous-problemes

### Ressources et inspiration

| Source | Contribution |
|--------|--------------|
| **Argument_Analysis notebooks** | Patterns SK (StateManager, orchestration) |
| **Harmonic Aristotle** | Decomposition hierarchique, IMO Gold 2025 |
| **APOLLO** | Generation massive, filtrage par Lean |
| **AlphaProof** | RL + MCTS, Nature 2025 |
| **LeanDojo** | Extraction donnees, LeanCopilot |

### Impact futur

Les systemes agentiques pour theorem proving representent une nouvelle frontiere:
- **15+ problemes Erdos** resolus par IA depuis Noel 2025
- **Acceleration x10-100** de la formalisation mathematique
- **Decouverte** de nouvelles mathematiques par collaboration humain-IA
- **Verification formelle** comme standard de confiance absolue

---

*Notebook base sur les techniques de Harmonic Aristotle (IMO Gold 2025), APOLLO (arXiv 2505), AlphaProof (Nature 2025), et les patterns Semantic Kernel inspires de Argument_Analysis*

---

**Navigation** : [← Lean-7-LLM-Integration](Lean-7-LLM-Integration.ipynb) | [Index](Lean-1-Setup.ipynb) | [Lean-9-LeanDojo →](Lean-9-LeanDojo.ipynb)