I work on the loop between large language models and the people who use them — systems that get better the more they are used. Prompts that learn from feedback, retrieval that reranks itself, evaluation that closes its own gaps. Most of what I publish is the runtime under that idea, written small enough to read in one sitting.
I write Python by default and choose the smallest tool that survives contact with production. I care about clear interfaces, honest benchmarks, and code that another person can own after I leave.
RLprompt · pypi An online reinforcement-learning framework for system-prompt refinement. Each human interaction defines a perception cycle that feeds a two-stage critic — the prompt evolves with use.
code-quality-mcp A Model Context Protocol server that exposes Python static analysis — flake8, mypy, McCabe, vulture — to LLM agents as first-class tools rather than parsed shell output.
Iterative-shifting-disaggregation Implementation of the ISD algorithm for decomposing aggregated time-series into their constituent signals.
reranker-research Ongoing notes and experiments on rerankers for retrieval and RAG pipelines.
Building tooling around evaluation and feedback loops for LLM applications. Open to conversations with teams thinking seriously about the same problems.
linkedin · orcid · pypi