Allow for objective and hybrid pareto frontier tracking #92

MatsErdkamp · 2025-10-02T11:35:19Z

This is currently downstream from an issue in DSPy. I took a crack at solving it myself but the implementation will probably change. I will setup the PR for this soon. After some final polish.

then I made another branch for DSPy that adapts GEPA in DSPy to support the objective frontier. I'm using this to test the implementation

Quite a mess, with downstream and upstream issues; See this as a potential implementation so we can get a sense of how the full 'subscores' implementation would feel -- both in DSPy and GEPA.

MatsErdkamp · 2025-10-10T08:58:00Z

In the upstream DSPy issue I coined 'subscores', which is already in use here and can be a bit confusing. Trying to think of the cleanest way to resolve this

LakshyAAAgrawal · 2025-10-13T06:35:47Z

Dear @MatsErdkamp,

Thank you so much for this PR. This very nicely complements #100, and solves #2, and is a much needed feature in GEPA!

I am planning to merge this after #100, however, I think this PR will conflict significantly with it, since both this and #100 make very large (and necessary) improvements. There are some changes to be made to #100 before merging, but could you please plan to rebase these changes over #100 so that all the merge conflicts that will be introduced after merging can be addressed? I expect the timeline for #100 to be within the next 3 days.

…ive-tracking-to-gepa Add objective-aware Pareto tracking and subscores support

- Reorganized import statements for clarity. - Renamed `pareto_frontier_type` to `frontier_type` for consistency across the codebase. - Enhanced assertion messages for better debugging. - Improved formatting and spacing for better readability in multiple files. - Updated method signatures and docstrings to reflect changes in parameters. - Added support for tracking discovery evaluation counts in `GEPAResult` and `GEPAState` classes.

MatsErdkamp · 2025-10-13T18:35:23Z

@LakshyAAAgrawal Done. With some interesting preliminary results..!

darinkishore · 2025-10-17T23:13:11Z

@MatsErdkamp Dude, that's incredible! What's the path for using this with current DSPy pipelines? Does it rely on Subscores ? Would love to test out the implementation!

LakshyAAAgrawal · 2025-10-25T03:30:56Z

Dear @MatsErdkamp,

That is an amazing result. I am excited to get to this PR soon. #100 is finally closing in on completion, and I will shift my focus to this soon after. On that note, there have been several more updates to #100, so we will still need to merge/rebase against that.

LakshyAAAgrawal · 2025-10-25T22:04:17Z

Dear @MatsErdkamp,

#100 has been merged. I will now shift focus towards working on this. Could you please merge from main and fix any conflicts, and then I can start reviewing this?

…roved validation handling.

…jective_scores' for clarity and consistency in handling evaluation metrics.

MatsErdkamp · 2025-10-26T13:27:42Z

@LakshyAAAgrawal its been updated!

uv.lock

…compatibility.

LakshyAAAgrawal · 2025-11-16T09:59:06Z

Hi @MatsErdkamp,

I am trying to resolve the merge conflicts from main in preparation to merge this. Could you please add some end-to-end tests with using both using objective and hybrid tracking, so that I can be sure that the merge conflict resolution do not cause any issues? Would be especially useful if the test can be as end-to-end as in https://github.com/gepa-ai/gepa/blob/main/tests/test_aime_prompt_optimization/test_aime_prompt_optimize.py.

LakshyAAAgrawal · 2025-11-16T10:38:29Z

@MatsErdkamp, I have tried resolving the merge conflicts and pushed my changes. Could you please implement some tests from your last commit (e7e3fb5), and add them to the PR please?

src/gepa/core/engine.py

LakshyAAAgrawal · 2025-11-17T07:32:58Z

Dear @MatsErdkamp,

The PR looks great. Thank you so much for your kind contribution. Could you please check if my changes are as you'd expect, by comparing the tip of the branch against your last commit (e7e3fb5)? Next, based on that commit itself, could you please implement some e2e tests, to ensure my changes didn't break those.

Once the tests are there, I will go ahead and merge!

…ter for scoring function

… sizes and metric parameters

…different results between frontier types

MatsErdkamp · 2025-11-27T22:48:53Z

Thought this was quite tricky but got something working it seems. tests should also pass now (at least locally they do). @LakshyAAAgrawal

LakshyAAAgrawal · 2025-12-03T22:21:01Z

Hi @MatsErdkamp, Thank you so much for helping with this! Unfortunately the tests are still failing. Would you be able to take a look?

MatsErdkamp · 2025-12-07T16:30:24Z

Should work now? Could you approve the workflow please @LakshyAAAgrawal

LakshyAAAgrawal · 2026-01-05T16:41:33Z

Dear @MatsErdkamp, Thank you so much for this PR. It adds a very important feature to GEPA.

MatsErdkamp · 2026-01-06T09:02:48Z

Thank you for taking the time to correct some stuff @LakshyAAAgrawal. First time I worked on a project like this; learned a lot and still lots to learn!

LakshyAAAgrawal · 2026-01-06T11:10:15Z

I apologize this took so long to merge, but look forward to future contributions from you both on here and DSPy/GEPA. Let me know if there's one already that I should direct my attention to.

MatsErdkamp · 2026-01-06T12:18:07Z

I do have a draft PR open to handle the DSPy side of this feature. stanfordnlp/dspy#8888

I see it has some merge conflicts now so I'll fix those and let you know. Any feedback you already might have on the implementation is welcome!

MatsErdkamp marked this pull request as ready for review October 10, 2025 08:32

MatsErdkamp added 3 commits October 13, 2025 12:47

Add objective-aware Pareto tracking and subscores support

fec5ecf

Merge pull request #9 from MatsErdkamp/codex/add-subscores-and-object…

ee74b86

…ive-tracking-to-gepa Add objective-aware Pareto tracking and subscores support

MatsErdkamp force-pushed the main branch from 3e8bff6 to c590154 Compare October 13, 2025 12:23

Merge branch 'main' into main

b3524e3

MatsErdkamp added 4 commits October 26, 2025 11:42

merge gepastate branch

c516088

Merge branch 'main' into main

b108fb5

Uefactor GEPAEngine and GEPAState to utilize ValsetEvaluation for imp…

a57b651

…roved validation handling.

Refactor DspyAdapter and EvaluationBatch to rename 'subscores' to 'ob…

ac3da32

…jective_scores' for clarity and consistency in handling evaluation metrics.

LakshyAAAgrawal reviewed Nov 5, 2025

View reviewed changes

uv.lock Outdated Show resolved Hide resolved

Update uv.lock to increment revision to 3 and specify Python version …

e7e3fb5

…compatibility.

LakshyAAAgrawal added 2 commits November 16, 2025 02:18

Merge branch 'main' into pr-92

a554f87

Update typing

b87836f

Fix tests

7162229

LakshyAAAgrawal reviewed Nov 17, 2025

View reviewed changes

src/gepa/core/engine.py Outdated Show resolved Hide resolved

LakshyAAAgrawal added 5 commits November 16, 2025 16:57

Update GEPAEngine to directly use the adapter

532ccf8

Remove EvaluatorFn

71fadb3

Add FrontierType literal

8477d91

Simplify frontier

84bc6c7

Fix tests

74e21f0

LakshyAAAgrawal previously approved these changes Nov 17, 2025

View reviewed changes

Update dependencies and add new example for PUPA dataset; adjust adap…

18816af

…ter for scoring function

MatsErdkamp dismissed LakshyAAAgrawal’s stale review via 18816af November 27, 2025 10:07

MatsErdkamp added 2 commits November 27, 2025 23:34

Refactor scoring calculation in Pareto frontier tests; adjust dataset…

0c5705b

… sizes and metric parameters

Update max metric calls in Pareto frontier tests from 8 to 32 to get …

2a67bfc

…different results between frontier types

LakshyAAAgrawal added 15 commits December 29, 2025 01:55

Merge branch 'main' of https://github.com/gepa-ai/gepa into pr-92

aa5d1fb

Fix failing tests

47c4ca6

Migrate pupa.py to test file

70a775f

Fix

d35990c

Revert

afdb15a

Use ValsetEvaluation

e9f4ebf

Fix defaultadapter

be9ed4d

Fix types

c53a043

Fixes

cf90ee4

Fix

8d87beb

Remove redundant method

013fdff

Fixes

c143cb7

Add docstrings

fbfd1a6

Fix

ca77371

Ruff

2ab4102

LakshyAAAgrawal merged commit b66fa9f into gepa-ai:main Jan 5, 2026
10 checks passed

LakshyAAAgrawal mentioned this pull request Jan 6, 2026

Support pareto tracking of multiple scores returned by metric #2

Closed

Allow for objective and hybrid pareto frontier tracking #92

Allow for objective and hybrid pareto frontier tracking #92

Conversation

MatsErdkamp commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MatsErdkamp commented Oct 10, 2025

Uh oh!

LakshyAAAgrawal commented Oct 13, 2025

Uh oh!

MatsErdkamp commented Oct 13, 2025

Uh oh!

darinkishore commented Oct 17, 2025

Uh oh!

LakshyAAAgrawal commented Oct 25, 2025

Uh oh!

LakshyAAAgrawal commented Oct 25, 2025

Uh oh!

MatsErdkamp commented Oct 26, 2025

Uh oh!

Uh oh!

LakshyAAAgrawal commented Nov 16, 2025

Uh oh!

LakshyAAAgrawal commented Nov 16, 2025

Uh oh!

Uh oh!

LakshyAAAgrawal commented Nov 17, 2025

Uh oh!

MatsErdkamp commented Nov 27, 2025

Uh oh!

LakshyAAAgrawal commented Dec 3, 2025

Uh oh!

MatsErdkamp commented Dec 7, 2025

Uh oh!

Uh oh!

LakshyAAAgrawal commented Jan 5, 2026

Uh oh!

MatsErdkamp commented Jan 6, 2026

Uh oh!

LakshyAAAgrawal commented Jan 6, 2026

Uh oh!

MatsErdkamp commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MatsErdkamp commented Oct 2, 2025 •

edited

Loading