This repository contains companion code for the blog post Human-Aligned LLM Evaluation with DSPy.
The tutorial uses data from the MultiClinSUM shared task, a multilingual clinical report summarization challenge organized by the Barcelona Supercomputing Center's NLP for Biomedical Information Analysis group.
The tutorial/ directory contains a complete end-to-end tutorial demonstrating the Prodigy-DSPy workflow for clinical report summarization. It guides you through:
- Annotating data with a baseline DSPy program
- Evaluating and collecting human feedback on metrics
- Synthesizing insights from feedback
- Optimizing the program with human-in-the-loop guidance
See the tutorial README for detailed instructions on running the project.
- Prodigy Company Plugins version 0.5.0 or later
- See
tutorial/requirements.txtfor full dependencies