Skip to content

Add benchmark version to result json.#1407

Merged
superdosh merged 2 commits into
mainfrom
results-version
Dec 5, 2025
Merged

Add benchmark version to result json.#1407
superdosh merged 2 commits into
mainfrom
results-version

Conversation

@superdosh
Copy link
Copy Markdown
Contributor

For #1406

@superdosh superdosh temporarily deployed to Scheduled Testing December 5, 2025 14:02 — with GitHub Actions Inactive
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Dec 5, 2025

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@superdosh superdosh marked this pull request as ready for review December 5, 2025 14:06
@superdosh superdosh requested a review from a team as a code owner December 5, 2025 14:06
Comment thread src/modelbench/record.py Outdated
Copy link
Copy Markdown
Contributor

@wpietri wpietri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solves the stated need, but I'm wondering if we should just put in the full benchmark uid twice: once in the string form and once in the full JSON form. Seems like the real problem here is "want a piece of the benchmark UID without parsing it", so unless we're sure version is the only thing we need, I'd rather do it all at once unless there's some reason not to.

@rogthefrog, given that you're both the requester and the person who has spent the most time on structured benchmark UIDs, what are your thoughts?

@superdosh
Copy link
Copy Markdown
Contributor Author

This solves the stated need, but I'm wondering if we should just put in the full benchmark uid twice: once in the string form and once in the full JSON form. Seems like the real problem here is "want a piece of the benchmark UID without parsing it", so unless we're sure version is the only thing we need, I'd rather do it all at once unless there's some reason not to.

@rogthefrog, given that you're both the requester and the person who has spent the most time on structured benchmark UIDs, what are your thoughts?

Conclusion from standup was that we'd put everything in. Will update the PR!

@superdosh superdosh temporarily deployed to Scheduled Testing December 5, 2025 18:02 — with GitHub Actions Inactive
Comment thread src/modelbench/record.py
elif isinstance(o, BenchmarkDefinition):
return {"uid": o.uid, "hazards": o.hazards()}
def_dict = {"uid": o.uid, "hazards": o.hazards()}
def_dict.update(o.uid_dict)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need both the uid and the uid dictionary? Do they contain different information?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dictionary contains all the elements of the uid, but it doesn't explain how to turn the elements into the full string. So if anybody needs to compare to a string uid, they'll need either import modelbench or duplicate the logic. Plus with file format it's generally better to be backwards compatible, so that we don't have to hunt down and rewrite anything that parses the results JSON.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an example of what uid_dict looks like:

{'class': 'general_purpose_ai_chat_benchmark', 'version': '1.1', 'locale': 'en_us', 'prompt_set': 'practice', 'evaluator': 'default'}

Copy link
Copy Markdown
Contributor

@rogthefrog rogthefrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

@superdosh superdosh merged commit 3310f9f into main Dec 5, 2025
2 checks passed
@superdosh superdosh deleted the results-version branch December 5, 2025 19:16
@github-actions github-actions Bot locked and limited conversation to collaborators Dec 5, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants