Feat/110 retrieval tracking#14
Merged
savourylie merged 7 commits intodevfrom Sep 1, 2025
Merged
Conversation
- Implement concurrent LLM calls with controlled delays to avoid server overload - Add new command-line arguments for concurrency and delay settings - Refactor run_benchmark_evaluation to use concurrent tasks - Update related scripts to support new concurrency options
…tions - Remove concurrent_delay argument from run_benchmark.py, scripts/load_and_run.py, and scripts/run_evaluation.py - Update _evaluate_single_question and run_benchmark_evaluation functions to remove staggered execution logic - This change simplifies the concurrency control mechanism, using asyncio's built-in concurrency primitives without adding artificial delays
…iciency - Implement concurrent evaluation of questions using asyncio Semaphore - Refactor question evaluation logic into a separate function - Update default LLM provider to OpenAI in run_benchmark.py - Change default LLM provider to Gemini in utils.py - Optimize data loading and evaluation flow
…king improvements - Added concurrent evaluation support from feat/concurrent branch - Extended retrieval metrics to support MSC dataset alongside LME - Enhanced string normalization with regex-based punctuation removal for better matching - Improved argument parsing with provider-specific model defaults - Updated gitignore to include .vscode/ directory - Maintained backward compatibility while adding new features
…luation - Fixed 'list' object has no attribute 'lower' error by using correct function call - Restored calculate_enhanced_retrieval_metrics() instead of simple calculate_retrieval_metrics() - Includes both enhanced and legacy metrics for comparison - Maintains concurrent evaluation while preserving enhanced retrieval tracking features
…e mode - Add --retrieval-verbose flag to capture and display full retrieved memory content - Enhance incorrect questions CSV export with recall flags for better analysis - Support CSV format parsing in question IDs file for compatibility - Improve memory content display with score information and truncation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.