feat: Add commit message evaluation system #11
Merged
nick-galluzzo merged 11 commits intomainfrom Aug 1, 2025
Merged
Conversation
New Pydantic models for evaluating commit message quality with WHAT/WHY dimensions, scoring system, and quality thresholds. Includes EvaluationDimension enum and EvaluationResult model with validation rules and serialization methods.
…h Chain-of-Thought reasoning - Introduce LLMEvaluator class for assessing commit message quality - Add evaluation prompts with few-shot examples and structured JSON responses - Implement EvaluationResult model with WHAT/WHY scoring (1-5 scale) - Update CLI to display evaluation results alongside generated commit messages - Refactor AI client to support both generation and evaluation workflows - Add comprehensive tests for evaluation system components - Update default model to qwen_3_coder (non-free version) - Improve debugging workflow with interactive debug runner script - Remove deprecated debugpy attachment configuration from CLI
…ommand - Added new evaluate CLI command for commit message evaluation - Removed automatic evaluation from generate command - Updated CLI module imports and registrations - Fixed missing newline in evaluate.py file
…-readable levels - Introduce ScoreThresholds constants for quality ratings - Add QualityRater class with methods for quality level assessment - Update EvaluationResult model with quality_level and is_high_quality properties - Replace inline quality logic with centralized rating system - Add comprehensive tests for new rating functionality - Update existing tests to use new quality_level property
…ith color-coded ratings
…tests - Introduce EvaluationService to encapsulate evaluation workflow - Update CLI command to use new service and display formatter - Fix typo in prompt template tag - Reduce AI client temperature from 0.3 to 0.1 for more deterministic results - Remove redundant model_used field from evaluation response parsing
- Move prompt construction from AIClient to CLI generate command - Simplify AIClient to accept pre-built prompts - Update tests to reflect new parameter structure - Remove redundant validation logic from client layer
- Introduce GenerationService as high-level interface for commit message generation - Add GenerationResult and GenerationRequest models for structured data handling - Create CommitMessageGenerator for LLM-based message creation - Refactor CLI generate command to use new service layer - Update tests to reflect new architecture and component responsibilities - Rename LLMEvaluator to CommitMessageEvaluator for clarity - Remove direct AI client and diff parser usage from CLI layer This change establishes a cleaner separation of concerns in the generation pipeline while maintaining existing functionality.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR introduces a comprehensive commit message evaluation system alongside architectural improvements to the generation workflow.
Key Features
evaluatecommand to assess commitmessage quality with LLM-based Chain-of-Thought reasoning
with color-coded display formatting
dedicated service layer with proper models
diffs with improved error handling
Major Changes
formatters
layer
Minor Changes
Commands
diffmage evaluate- Evaluate existing commit message qualitydiffmage generate- Generate commit messages (improved architecture)