Fix flaky Hypothesis stateful test for unique name generation #10946

max-sixty · 2025-11-23T21:36:00Z

Claude tried to fix the flakiness issue in CI. I probed its logic a bit, it looks reasonable but I don't fully understand. Our summary below...

Summary

Fix flaky DatasetStateMachine test that was causing CI failures with FlakyStrategyDefinition: Inconsistent data generation!
Move unique name tracking from st.shared() strategy to state machine instance state

Problem

The nightly Slow Hypothesis CI has been failing:
https://github.com/pydata/xarray/actions/runs/19603404994

The test properties/test_index_manipulation.py::DatasetTest::runTest fails intermittently with:
hypothesis.errors.FlakyStrategyDefinition: Inconsistent data generation!
Data generation behaved differently between different runs.
Is your data generation depending on external state?

Root Cause

The original code used a pattern from [Stack Overflow](https://stackoverflow.com/questions/73737073/c
reate-hypothesis-strategy-that-returns-unique-values):

@st.composite
def unique(draw, strategy):
    seen = draw(st.shared(st.builds(set), key="key-for-unique-elems"))
    return draw(
        strategy.filter(lambda x: x not in seen).map(lambda x: seen.add(x) or x)
    )

While this pattern works for regular @given tests, it does not work for RuleBasedStateMachine
because:

The shared set's contents depend on which rules ran before
During shrinking, Hypothesis replays with different rule orderings
This causes different names to be drawn, breaking determinism

Per HypothesisWorks/hypothesis#2338:
"I think you'll need to set these up as instance attributes, rather than class attributes."

Solution

Move uniqueness tracking to state machine instance state:

Added self.used_names: set[str] (reset per test case via init)
Added _draw_unique_name() that filters and tracks used names
Updated all rules to use data=st.data() and call helper methods

This follows the Hypothesis-recommended pattern for stateful tests where cross-rule state belongs on
the instance.

@given

The original approach used st.shared(st.builds(set), ...) with mutation to track unique names. While this pattern works for regular @given tests, it doesn't work correctly in RuleBasedStateMachine because the shared set's contents depend on which rules ran before, causing inconsistent replay during shrinking. Per Hypothesis maintainer guidance, moved uniqueness tracking to instance state (self.used_names) which is properly isolated per test case. Co-authored-by: Claude <noreply@anthropic.com>

dcherian

Thank you! I looked at it a couple of months ago and couldn't figure out what was going on

github-actions bot added the topic-hypothesis Strategies or tests using the hypothesis library label Nov 23, 2025

dcherian approved these changes Nov 24, 2025

View reviewed changes

dcherian added the run-slow-hypothesis Run slow hypothesis tests label Nov 24, 2025

max-sixty merged commit 42669f9 into pydata:main Nov 24, 2025
62 of 63 checks passed

max-sixty deleted the ci branch November 24, 2025 07:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix flaky Hypothesis stateful test for unique name generation #10946

Fix flaky Hypothesis stateful test for unique name generation #10946

max-sixty commented Nov 23, 2025 •

edited by dcherian

Loading

Uh oh!

dcherian left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Fix flaky Hypothesis stateful test for unique name generation #10946

Fix flaky Hypothesis stateful test for unique name generation #10946

Conversation

max-sixty commented Nov 23, 2025 • edited by dcherian Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Root Cause

Uh oh!

dcherian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

max-sixty commented Nov 23, 2025 •

edited by dcherian

Loading