Skip to content

Conversation

@cemde
Copy link
Collaborator

@cemde cemde commented Nov 29, 2025

Description

This PR adds a new example to the library. 5-a-day is a small dummy benchmark with 5 tasks that can be solved in different setups. It showcases a diverse set of tools and evaluation protocols to show the flexiblity of maseval.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Code quality improvement (refactoring, formatting, etc.)

Checklist

Contribution

Documentation

  • Added/updated docstrings for new/modified functions
  • Updated relevant documentation in docs/ (if applicable)
  • Tag github issue with this PR (if applicable)

Changelog

  • Added entry to CHANGELOG.md under [Unreleased] section
    • Use Added section for new features
    • Use Changed section for modifications to existing functionality
    • Use Fixed section for bug fixes
    • Use Removed section for deprecated/removed features
  • OR this is a documentation-only change (no changelog needed)

Example:
- Support for multi-agent tracing (PR:#123)

Architecture (if applicable)

  • Core/Interface separation: Changes in maseval/core/ do NOT import from maseval/interface/
  • Dependencies: New core dependencies added sparingly; framework integrations go to optional dependencies

Additional Notes

@github-actions
Copy link

github-actions bot commented Nov 29, 2025

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  maseval/core
  benchmark.py
  evaluator.py
  maseval/interface/agents
  llamaindex.py
Project Total  

This report was generated by python-coverage-comment-action

@cemde cemde merged commit bdda680 into main Dec 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants