Fix flaky tests by making ImmutableLabelInfo ID assignment deterministic #424

lbh930 · 2025-11-29T22:44:48Z

Description

Modified ImmutableLabelInfo to sort labels lexicographically before assigning IDs. This ensures that the mapping from Label to ID is deterministic and doesn't depend on the iteration order of HashMap. Updated the regression testsTestSGDLinear, TestFMClassification, TestClassificationEnsembles that relied on the previous non-deterministic ID assignment. This made sure the comparisons in the tests are not order-dependent.

Motivation

NonDex detected test flakiness in 4 tests:

org.tribuo.classification.mnb.TestMNB.testSingleClassTraining
org.tribuo.classification.SerializationTest.load431Protobufs
org.tribuo.classification.sgd.fm.TestFMClassification.loadProtobufModel
org.tribuo.classification.sgd.linear.TestSGDLinear.testSingleClassTraining

This PR is verified by NonDex to fix them all. The root cause for all of them was that the order of label IDs depended on HashMap iteration order, which has no guarantee of determinism. When the ID assignment changed, the resulting model parameters and predictions varied. By sorting the labels, the ID assignment will be consistent and make model training deterministic.

Craigacp · 2025-11-30T02:55:17Z

We had to do this for the regression infos some time ago to fix a nasty indexing bug there, however it required lots of juggling in the models to fix as different indices cause problems. I don't think this could cause similar problems, but the iteration order here of the new style ones is guaranteed to be in increasing string sort order, and old ones won't be (as they still will deserialize into a HashMap not a LinkedHashMap and they can't be fixed to be in string sort order as that would break existing models). I'll need to think more about if this is safe and if we can construct some tests to check that it is safe.

If we do do this then you should use the TreeSet idiom we have in ImmutableRegressionInfo to construct the sorted keys, it's a bit shorter and I'd prefer to keep things consistent - https://github.com/oracle/tribuo/blob/main/Regression/Core/src/main/java/org/tribuo/regression/ImmutableRegressionInfo.java#L81.

Fix flaky tests by making ImmutableLabelInfo ID assignment deterministic

a14d211

oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Nov 29, 2025

Use TreeSet to sort in ImmutableLabelInfo

46d6f1f

Craigacp mentioned this pull request Dec 1, 2025

Fix flaky test in TestClassificationEnsembles by stabilizing HashMap iteration order #420

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix flaky tests by making ImmutableLabelInfo ID assignment deterministic #424

Fix flaky tests by making ImmutableLabelInfo ID assignment deterministic #424

lbh930 commented Nov 29, 2025

Uh oh!

Craigacp commented Nov 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix flaky tests by making ImmutableLabelInfo ID assignment deterministic #424

Are you sure you want to change the base?

Fix flaky tests by making ImmutableLabelInfo ID assignment deterministic #424

Conversation

lbh930 commented Nov 29, 2025

Description

Motivation

Uh oh!

Craigacp commented Nov 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants