Platform Engineer · AIOps · MLOps · LLM-Orchestrated Infrastructure
Research Fellow, University of Bologna · Bologna, Italy
I build autonomous AI systems that act on infrastructure — not just explain it. Seven years of hands-on ops in mission-critical industrial environments before a PhD in HPC systems gives me a different lens: I care about correctness, observability, and production trust.
KubeIntellect — Autonomous Kubernetes Operations
LLM-orchestrated multi-agent framework for root cause analysis, diagnosis, and human-gated cluster operations across the full Kubernetes API surface.
- LangGraph FSM supervisor with PostgreSQL checkpoints and human-in-the-loop approval gates
- Dynamic Code-Generator agent: sandboxed tool synthesis and validation at runtime
- Modular domain agents: logs, metrics, RBAC, lifecycle, scheduling, exec, proxy
- 93% tool synthesis success rate · 100% reliability across 200+ queries
| Project | Description | Key Metrics | Stack |
|---|---|---|---|
| kube_q | CLI + Python SDK for KubeIntellect | Streaming responses, Rich TUI | Python |
| AOBench | Agent Operations Benchmark — role-aware, permission-enforced, trace-based HPC agent evaluation | 80 tasks · 26 environments | Python, LLM Eval, MCP |
| GRAAFE | Graph anomaly anticipation for exascale HPC | AUC 0.91 · 1000+ nodes | Python, GCN |
| HazardNet | Thermal hazard prediction for datacenters | F1 0.99 · <100ms inference | Python, TCN/LSTM |
PhD: Design, Analysis, and Management of High-Performance Computing Systems · University of Bologna (2018–2022)
EU Projects: DECICE · Graph-Massivizer · EUROPEAN PILOT · REGALE · EPI SGA1 · SEANERGYS
Scholar:
| Citations | h-index | i10-index |
|---|---|---|
| 179 (154 since 2021) | 7 | 6 |
| Title | Venue | Year | Citations |
|---|---|---|---|
| KubeIntellect: A Modular LLM-Orchestrated Agent Framework for Kubernetes Management | arXiv | 2025 | — |
| M100 ExaData: A Data Collection Campaign on CINECA's Marconi100 Tier-0 Supercomputer | Nature Scientific Data | 2023 | 50 |
| PM100: A Job Power Consumption Dataset of a Large-Scale Production HPC System | SC'23 Workshops | 2023 | 21 |
| GRAAFE: Graph Anomaly Anticipation Framework for Exascale HPC Systems | FGCS | 2024 | 17 |
| HazardNet: Thermal Hazard Prediction Framework for Datacenters | FGCS | 2024 | — |
| Multi-level Anomaly Prediction in Tier-0 Datacenter | ACM Computing Frontiers | 2022 | — |
Platform & Infrastructure
AI / ML
HPC
Observability
PC Member: PDP 2025 · PDP 2026 · AsHES 2026
Reviewer: IEEE TCAD · FGCS · Journal of Grid Computing · SC · ACM CF · DATE · PDP · AsHES
Supervision: 2 PhD co-advisees (ongoing) · 5 MSc theses completed · Lab of Big Data Architectures, UniBo (2020–2024)


