Skip to content

Feat/110 retrieval tracking#14

Merged
savourylie merged 7 commits intodevfrom
feat/110-retrieval-tracking
Sep 1, 2025
Merged

Feat/110 retrieval tracking#14
savourylie merged 7 commits intodevfrom
feat/110-retrieval-tracking

Conversation

@savourylie
Copy link
Copy Markdown
Contributor

No description provided.

M1n9X and others added 7 commits August 25, 2025 17:28
- Implement concurrent LLM calls with controlled delays to avoid server overload
- Add new command-line arguments for concurrency and delay settings
- Refactor run_benchmark_evaluation to use concurrent tasks
- Update related scripts to support new concurrency options
…tions

- Remove concurrent_delay argument from run_benchmark.py, scripts/load_and_run.py, and scripts/run_evaluation.py
- Update _evaluate_single_question and run_benchmark_evaluation functions to remove staggered execution logic
- This change simplifies the concurrency control mechanism, using asyncio's built-in concurrency primitives without adding artificial delays
…iciency

- Implement concurrent evaluation of questions using asyncio Semaphore
- Refactor question evaluation logic into a separate function
- Update default LLM provider to OpenAI in run_benchmark.py
- Change default LLM provider to Gemini in utils.py
- Optimize data loading and evaluation flow
…king improvements

- Added concurrent evaluation support from feat/concurrent branch
- Extended retrieval metrics to support MSC dataset alongside LME
- Enhanced string normalization with regex-based punctuation removal for better matching
- Improved argument parsing with provider-specific model defaults
- Updated gitignore to include .vscode/ directory
- Maintained backward compatibility while adding new features
…luation

- Fixed 'list' object has no attribute 'lower' error by using correct function call
- Restored calculate_enhanced_retrieval_metrics() instead of simple calculate_retrieval_metrics()
- Includes both enhanced and legacy metrics for comparison
- Maintains concurrent evaluation while preserving enhanced retrieval tracking features
…e mode

- Add --retrieval-verbose flag to capture and display full retrieved memory content
- Enhance incorrect questions CSV export with recall flags for better analysis
- Support CSV format parsing in question IDs file for compatibility
- Improve memory content display with score information and truncation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants