Enhance simple-evals for beginner to run #87

ECNU3D · 2025-06-05T01:05:25Z

This PR extends the original simple-evals repository with the following key improvements. The full extension of simple-eval to agentic eval generation can be find here: https://github.com/ECNU3D/agentic-simple-evals

Additional Model Support

Gemini Models: Added support for Google's Gemini models (GeminiSampler) with both API key and Vertex AI authentication, including support for Gemini grounding capabilities
Claude on Vertex AI: Implemented ClaudeVertexCompletionSampler for running Claude models through Google Cloud Vertex AI instead of direct Anthropic API
Llama Models on Vertex AI: Added examples to show how to integrate with OpenAI API compatible models

Windows Compatibility

Windows HumanEval Fix: Added human_eval_windows_patch.py to resolve Windows compatibility issues with the HumanEval benchmark by replacing Unix-specific timeout mechanisms with Windows-compatible threading-based solutions

Infrastructure Improvements

Checkpointing System: Implemented robust checkpointing functionality across all evaluations to support resuming interrupted evaluation runs, with checkpoint loading and saving capabilities
Batch Processing: Added configurable batch processing to improve memory management and allow for better control over evaluation execution
Enhanced Error Handling: Improved exception handling and retry mechanisms for API calls
Progress Tracking: Better progress reporting and logging throughout evaluation processes

Configuration Enhancements

Environment Variable Handling: Improved API key and authentication management with fallback mechanisms
Configurable Parameters: Enhanced parameterization for batch sizes, timeouts, and other evaluation settings
Flexible Authentication: Support for multiple authentication methods including API keys, Vertex AI, and Application Default Credentials

… open ai compatible endpoint feat: enhance simple eval with math, mmlu, and gpqa support

ECNU3D added 2 commits June 5, 2025 08:58

feat: enhance simple eval and make it really work with dependency and…

3626965

… open ai compatible endpoint feat: enhance simple eval with math, mmlu, and gpqa support

chore: clean up for PR

46d956f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhance simple-evals for beginner to run #87

Enhance simple-evals for beginner to run #87

Uh oh!

ECNU3D commented Jun 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Enhance simple-evals for beginner to run #87

Are you sure you want to change the base?

Enhance simple-evals for beginner to run #87

Uh oh!

Conversation

ECNU3D commented Jun 5, 2025

Additional Model Support

Windows Compatibility

Infrastructure Improvements

Configuration Enhancements

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant