# üñ•Ô∏è CLI Examples - PTOF Analysis System

Questo notebook contiene tutti gli snippet di codice per utilizzare gli script CLI del sistema di analisi PTOF.

**‚öôÔ∏è Configurazione**: Tutti i comandi utilizzano il virtual environment `.venv` del progetto.

**üìÇ Struttura Directory**:
- `ptof_inbox/` ‚Üí PDF da analizzare (INSERISCI QUI)
- `ptof_processed/` ‚Üí PDF archiviati automaticamente

## üìë Indice
1. [üöÄ **Workflow Automatico**](#workflow-automatico) ‚≠ê PRIORIT√Ä
2. [üîó Analisi Multi-Agente Batch](#multi-agent-batch)
3. [‚òÅÔ∏è Cloud Agent](#cloud-agent)
4. [üîç Analisi e Revisione](#analysis-review)
5. [üõ†Ô∏è Utility](#utilities)
6. [ü§ñ Automazione Background](#background-automation)
7. [üìä Diagnostica](#diagnostics)

---
## üöÄ Workflow Automatico {#workflow-automatico}

### ‚≠ê Pipeline Completa: Inbox ‚Üí Processed

**Cosa fa**:
1. üì• Legge PDF da `ptof_inbox/`
2. üìù Converte PDF ‚Üí Markdown
3. ü§ñ Analizza con pipeline multi-agente
4. üì¶ Sposta PDF in `ptof_processed/batch_TIMESTAMP/`
5. üìä Aggiorna CSV dashboard

In [None]:
%%bash
# WORKFLOW COMPLETO - Inbox ‚Üí Processed
cd /Users/danieledragoni/git/LIste
source .venv/bin/activate

echo "üöÄ WORKFLOW AUTOMATICO PTOF"
echo "üì• Input: ptof_inbox/"
echo "‚úÖ Output: ptof_processed/batch_TIMESTAMP/"
echo "========================================"

python workflow_ptof.py 2>&1 | tee logs/workflow_ptof.log

echo "========================================"
echo "‚úÖ Workflow completato!"
echo "üìã Log: logs/workflow_ptof.log"

üöÄ WORKFLOW AUTOMATICO PTOF
üì• Input: ptof_inbox/
‚úÖ Output: ptof_processed/batch_TIMESTAMP/
2025-12-22 00:39:43,545 - INFO - üìÇ Directory verificata: ptof_inbox/
2025-12-22 00:39:43,545 - INFO - üìÇ Directory verificata: ptof_processed/
2025-12-22 00:39:43,545 - INFO - üìÇ Directory verificata: ptof_md/
2025-12-22 00:39:43,545 - INFO - üìÇ Directory verificata: analysis_results/
2025-12-22 00:39:43,545 - INFO - üìÇ Directory verificata: logs/
2025-12-22 00:39:43,545 - INFO - üöÄ AVVIO WORKFLOW PTOF COMPLETO
2025-12-22 00:39:43,545 - INFO - üïê Timestamp: 2025-12-22 00:39:43
2025-12-22 00:39:43,546 - INFO - 
üìä STATO INIZIALE:
2025-12-22 00:39:43,546 - INFO -   PDF in inbox: 1
2025-12-22 00:39:43,546 - INFO -   PDF processati: 0
2025-12-22 00:39:43,546 - INFO -   File Markdown: 0
2025-12-22 00:39:43,547 - INFO -   File analisi: 0
2025-12-22 00:39:43,547 - INFO - üìù STEP 1: Conversione PDF ‚Üí Markdown
2025-12-22 00:39:43,547 - INFO - üìÑ Trovati 1 PDF da convertire
20

---
### Step Post-Analisi: Scova incongruenze nei nomi

Esegui questo step **dopo** il workflow (PDF ‚Üí MD ‚Üí Analisi).
Individua file con nomi non coerenti rispetto a `metadata.school_id`.


In [None]:
import os
import json
import glob

ANALYSIS_DIR = "analysis_results"
MD_DIR = "ptof_md"

def normalize_school_id(value):
    if value is None:
        return ""
    return str(value).strip()

def is_valid_school_id(value):
    value = normalize_school_id(value)
    if not value or value.upper() in {"ND", "N/A", "NONE", "UNKNOWN"}:
        return False
    return True

def build_issues():
    issues = []
    json_files = glob.glob(os.path.join(ANALYSIS_DIR, "*_analysis.json"))
    for json_path in sorted(json_files):
        filename = os.path.basename(json_path)
        base = filename.replace("_analysis.json", "")
        try:
            with open(json_path, "r") as f:
                data = json.load(f)
        except Exception as e:
            issues.append({
                "kind": "json_read_error",
                "current": json_path,
                "target": "",
                "reason": f"json read error: {e}",
            })
            continue
        school_id = data.get("metadata", {}).get("school_id")
        if not is_valid_school_id(school_id):
            issues.append({
                "kind": "invalid_school_id",
                "current": json_path,
                "target": "",
                "reason": f"metadata.school_id non valido: {school_id}",
            })
            continue
        school_id = normalize_school_id(school_id)
        if base != school_id:
            target_json = os.path.join(ANALYSIS_DIR, f"{school_id}_analysis.json")
            if os.path.exists(target_json):
                issues.append({
                    "kind": "conflict_json",
                    "current": json_path,
                    "target": target_json,
                    "reason": f"dest esiste per {school_id}",
                })
            else:
                issues.append({
                    "kind": "rename_json",
                    "current": json_path,
                    "target": target_json,
                    "reason": f"nome base {base} != metadata.school_id",
                })

            old_analysis_md = os.path.join(ANALYSIS_DIR, f"{base}_analysis.md")
            new_analysis_md = os.path.join(ANALYSIS_DIR, f"{school_id}_analysis.md")
            if os.path.exists(old_analysis_md):
                if os.path.exists(new_analysis_md):
                    issues.append({
                        "kind": "conflict_analysis_md",
                        "current": old_analysis_md,
                        "target": new_analysis_md,
                        "reason": f"dest esiste per {school_id}",
                    })
                else:
                    issues.append({
                        "kind": "rename_analysis_md",
                        "current": old_analysis_md,
                        "target": new_analysis_md,
                        "reason": f"nome base {base} != metadata.school_id",
                    })

            old_md = os.path.join(MD_DIR, f"{base}.md")
            new_md = os.path.join(MD_DIR, f"{school_id}.md")
            if os.path.exists(old_md):
                if os.path.exists(new_md):
                    issues.append({
                        "kind": "conflict_md",
                        "current": old_md,
                        "target": new_md,
                        "reason": f"dest esiste per {school_id}",
                    })
                else:
                    issues.append({
                        "kind": "rename_md",
                        "current": old_md,
                        "target": new_md,
                        "reason": f"nome base {base} != metadata.school_id",
                    })
    return issues

issues = build_issues()
print(f"Totale incongruenze trovate: {len(issues)}")
for idx, issue in enumerate(issues, 1):
    target = issue.get("target") or "-"
    print(f"[{idx:02d}] {issue['kind']} | {os.path.basename(issue['current'])} -> {os.path.basename(target)} | {issue['reason']}")


---
### Step Post-Analisi: Correggi incongruenze selezionate

**Piano di correzione**
1. Mostra il piano e richiede quali ID correggere.
2. Rinomina solo le voci selezionate (senza sovrascrivere).
3. Stampa un riepilogo finale.

Imposta `dry_run = True` per una simulazione senza rinominare.


In [None]:
def safe_rename(src, dst, dry_run=False):
    if os.path.abspath(src) == os.path.abspath(dst):
        return False
    if os.path.exists(dst):
        print(f"‚ö†Ô∏è Skip rename (dest exists): {os.path.basename(dst)}")
        return False
    if dry_run:
        print(f"[DRY] {os.path.basename(src)} -> {os.path.basename(dst)}")
        return True
    os.rename(src, dst)
    print(f"‚úÖ Renamed: {os.path.basename(src)} -> {os.path.basename(dst)}")
    return True

if not issues:
    print("Nessuna incongruenza da correggere.")
else:
    print("Piano: 1) selezione ID  2) rinomina  3) riepilogo")
    selection = input("Quali ID vuoi correggere? (es. 1,3,5 | all | none): ").strip().lower()
    if selection in {"", "none", "no"}:
        print("Nessuna correzione eseguita.")
    else:
        if selection == "all":
            ids = list(range(1, len(issues) + 1))
        else:
            ids = []
            for part in selection.split(','):
                part = part.strip()
                if part.isdigit():
                    ids.append(int(part))
        dry_run = True
        renamed = 0
        for idx in ids:
            if idx < 1 or idx > len(issues):
                continue
            issue = issues[idx - 1]
            if not issue['kind'].startswith('rename_'):
                print(f"- Skip {idx:02d} ({issue['kind']})")
                continue
            if safe_rename(issue['current'], issue['target'], dry_run=dry_run):
                renamed += 1
        print(f"Rinominati: {renamed} (dry_run={dry_run})")


### Verifica Stato Directory

In [None]:
%%bash
cd /Users/danieledragoni/git/LIste

echo "üìä STATO DIRECTORY"
echo "========================================"
echo "üì• Inbox PDF: $(find ptof_inbox -name '*.pdf' 2>/dev/null | wc -l)"
echo "‚úÖ Processed PDF: $(find ptof_processed -name '*.pdf' 2>/dev/null | wc -l)"
echo "üìù Markdown: $(find ptof_md -name '*.md' 2>/dev/null | wc -l)"
echo "üìä Analisi JSON: $(find analysis_results -name '*.json' 2>/dev/null | wc -l)"
echo "========================================"

---
## üîó Analisi Multi-Agente Batch {#multi-agent-batch}

### Pipeline Multi-Agente per Cartella Completa
Processa tutti i file PTOF in una cartella usando l'architettura multi-agente (Analyst ‚Üí Reviewer ‚Üí Refiner ‚Üí Synthesizer).

In [None]:
%%bash
# Pipeline Multi-Agente - Batch completo su cartella ptof_md/
cd /Users/danieledragoni/git/LIste
source .venv/bin/activate

echo "üöÄ Avvio Pipeline Multi-Agente Batch"
echo "üìÇ Directory: ptof_md/"
echo "ü§ñ Agenti: Analyst ‚Üí Reviewer ‚Üí Refiner"
echo "========================================"

python app/agentic_pipeline.py 2>&1 | tee logs/agentic_pipeline.log

echo "========================================"
echo "‚úÖ Pipeline completata!"
echo "üìã Log: logs/agentic_pipeline.log"

### Pipeline Multi-Agente - Singolo File con Logging

In [None]:
# Pipeline Multi-Agente - Modalit√† singolo file con logging dettagliato
import sys
import os
import logging
from datetime import datetime

# Setup logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('logs/single_analysis.log'),
        logging.StreamHandler()
    ]
)
logger = logging.getLogger(__name__)

# Aggiungi path progetto
sys.path.insert(0, '/Users/danieledragoni/git/LIste')

from app.agentic_pipeline import (
    process_single_ptof,
    AnalystAgent,
    ReviewerAgent,
    RefinerAgent,
    SynthesizerAgent
)

logger.info("üöÄ Inizializzazione agenti multi-agente")

# Inizializza agenti
analyst = AnalystAgent()
reviewer = ReviewerAgent()
refiner = RefinerAgent()
synthesizer = SynthesizerAgent()

logger.info("‚úÖ Agenti inizializzati con successo")

# Status callback con logging
def status_callback(msg):
    logger.info(f"[PIPELINE] {msg}")

# Processa un singolo file PTOF
md_file = "ptof_md/MIIS08900V.md"  # Modifica con il tuo file
logger.info(f"üìÑ Processando file: {md_file}")

start_time = datetime.now()

result = process_single_ptof(
    md_file=md_file,
    analyst=analyst,
    reviewer=reviewer,
    refiner=refiner,
    synthesizer=synthesizer,
    status_callback=status_callback
)

elapsed = (datetime.now() - start_time).total_seconds()

logger.info("="*80)
logger.info(f"‚úÖ Analisi completata in {elapsed:.1f} secondi!")
logger.info(f"üìä Risultato keys: {list(result.keys()) if result else 'None'}")
logger.info(f"üìã Log salvato in: logs/single_analysis.log")

if result:
    metadata = result.get('metadata', {})
    logger.info(f"üè´ Scuola: {metadata.get('denominazione', 'N/A')}")
    logger.info(f"üìç Comune: {metadata.get('comune', 'N/A')}")

---
## ‚òÅÔ∏è Cloud Agent {#cloud-agent}

### Analisi Cloud con Logging

In [None]:
# Cloud Agent - Analisi completa con logging dettagliato
import sys
import logging
from datetime import datetime

sys.path.insert(0, '/Users/danieledragoni/git/LIste')

# Setup logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('logs/cloud_analysis.log'),
        logging.StreamHandler()
    ]
)
logger = logging.getLogger(__name__)

from src.processing.cloud_review import review_ptof_with_cloud, load_api_config

logger.info("‚òÅÔ∏è Inizializzazione Cloud Agent")

# Carica configurazione API dal file JSON
config = load_api_config()
provider = config.get('default_provider', 'gemini')
api_key_field = f'{provider}_api_key'

logger.info(f"üîê Provider: {provider}")
logger.info(f"üîë API Key: {'‚úÖ Configurata' if config.get(api_key_field) else '‚ùå Mancante'}")

# Carica il contenuto del file MD
md_file = 'ptof_md/MIIS08900V.md'
logger.info(f"üìÑ Caricando file: {md_file}")

with open(md_file, 'r', encoding='utf-8') as f:
    text = f.read()

logger.info(f"üìè Dimensione documento: {len(text):,} caratteri")

# Esegui analisi completa
start_time = datetime.now()
logger.info("üöÄ Avvio analisi cloud...")

result = review_ptof_with_cloud(
    md_content=text,
    provider=provider,
    api_key=config.get(api_key_field),
    model="gemini-2.0-flash-exp" if provider == 'gemini' else "google/gemini-2.0-flash-exp:free",
    school_id="MIIS08900V"
)

elapsed = (datetime.now() - start_time).total_seconds()

logger.info("="*80)
logger.info(f"‚úÖ Analisi completata in {elapsed:.1f} secondi!")

if result:
    metadata = result.get('metadata', {})
    logger.info(f"üè´ School ID: {metadata.get('school_id', 'N/A')}")
    logger.info(f"üìõ Denominazione: {metadata.get('denominazione', 'N/A')}")
    logger.info(f"üìç Comune: {metadata.get('comune', 'N/A')}")
    logger.info(f"üó∫Ô∏è Area: {metadata.get('area_geografica', 'N/A')}")
    
logger.info(f"üìã Log salvato in: logs/cloud_analysis.log")

---
## üîç Analisi e Revisione {#analysis-review}

### Rebuild CSV con Logging

In [None]:
%%bash
# Rebuild CSV - Ricostruzione indice da file JSON
cd /Users/danieledragoni/git/LIste
source .venv/bin/activate

echo "üìä Ricostruzione CSV da file JSON"
echo "üìÇ Input: analysis_results/*.json"
echo "üìÑ Output: data/analysis_summary.csv"
echo "========================================"

python src/processing/rebuild_csv.py 2>&1 | tee logs/rebuild_csv.log

echo "========================================"
echo "‚úÖ CSV ricostruito!"
echo "üìã Log: logs/rebuild_csv.log"

---
## üõ†Ô∏è Utility {#utilities}

### Conversione PDF ‚Üí Markdown con Logging

In [None]:
%%bash
# Conversione PDF ‚Üí Markdown
cd /Users/danieledragoni/git/LIste
source .venv/bin/activate

echo "üîÑ Conversione PDF ‚Üí Markdown"
echo "üìÇ Input: ptof/*.pdf"
echo "üìÇ Output: ptof_md/*.md"
echo "========================================"

python src/processing/convert_pdfs_to_md.py 2>&1 | tee logs/pdf_conversion.log

echo "========================================"
echo "‚úÖ Conversione completata!"
echo "üìã Log: logs/pdf_conversion.log"

---
## ü§ñ Automazione Background {#background-automation}

### Background Fixer - CLI con Logging

In [None]:
%%bash
# Background Fixer - Correzione automatica anomalie
cd /Users/danieledragoni/git/LIste
source .venv/bin/activate

echo "üõ†Ô∏è Background Fixer - Avvio"
echo "üìÇ Input: data/review_flags.json"
echo "üìÇ Output: analysis_results/"
echo "ü§ñ Modello: qwen2.5-coder:7b (Ollama)"
echo "========================================"

python run_fixer.py 2>&1 | tee logs/background_fixer.log

echo "========================================"
echo "‚úÖ Correzione completata!"
echo "üìã Log: logs/background_fixer.log"

---
## üìä Diagnostica {#diagnostics}

### Verifica Stato Sistema Completo

In [None]:
# System Status Check - Diagnostica completa con logging
import os
import json
import logging
from glob import glob
from datetime import datetime

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('logs/system_diagnostics.log'),
        logging.StreamHandler()
    ]
)
logger = logging.getLogger(__name__)

logger.info("="*80)
logger.info("üìä DIAGNOSTICA SISTEMA PTOF ANALYSIS")
logger.info(f"üïê Timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
logger.info("="*80)

# Check review flags
logger.info("\nüö© ANOMALIE RILEVATE")
if os.path.exists("data/review_flags.json"):
    with open("data/review_flags.json", 'r') as f:
        flags = json.load(f)
    logger.info(f"  File con anomalie: {len(flags)}")
else:
    logger.info("  Nessun file di anomalie trovato")

# Count files
logger.info("\nüìÑ FILE SISTEMA")
json_files = glob("analysis_results/*.json")
logger.info(f"  File analisi JSON: {len(json_files)}")

md_files = glob("ptof_md/*.md")
logger.info(f"  File Markdown: {len(md_files)}")

pdf_inbox = glob("ptof_inbox/*.pdf")
logger.info(f"  PDF in inbox: {len(pdf_inbox)}")

# API Configuration
logger.info("\nüîê CONFIGURAZIONE API")
if os.path.exists("data/api_config.json"):
    with open("data/api_config.json", 'r') as f:
        api_config = json.load(f)
    logger.info(f"  Default Provider: {api_config.get('default_provider', 'N/A')}")
    logger.info(f"  Gemini API: {'‚úÖ' if api_config.get('gemini_api_key') else '‚ùå'}")
    logger.info(f"  OpenRouter API: {'‚úÖ' if api_config.get('openrouter_api_key') else '‚ùå'}")

logger.info("\n="*80)
logger.info("‚úÖ Diagnostica completata!")
logger.info("üìã Log: logs/system_diagnostics.log")

---
## üìù Note

### Best Practices

- ‚úÖ **Virtual Environment**: Tutti i comandi usano `.venv`
- ‚úÖ **Logging**: Ogni operazione salva log in `logs/`
- ‚úÖ **Workflow Automatico**: Usa `workflow_ptof.py` per processare nuovi PDF
- ‚ö†Ô∏è **API Keys**: Configurate in `data/api_config.json`
- ‚ö†Ô∏è **Ollama**: Deve essere in esecuzione per pipeline multi-agente

### Directory

- `ptof_inbox/` - PDF da analizzare
- `ptof_processed/` - PDF archiviati
- `ptof_md/` - Markdown generati
- `analysis_results/` - JSON analisi
- `logs/` - File di log