Skip to content

Conversation

@max-sixty
Copy link
Collaborator

@max-sixty max-sixty commented Nov 23, 2025

Claude tried to fix the flakiness issue in CI. I probed its logic a bit, it looks reasonable but I don't fully understand. Our summary below...


Summary

  • Fix flaky DatasetStateMachine test that was causing CI failures with FlakyStrategyDefinition: Inconsistent data generation!
  • Move unique name tracking from st.shared() strategy to state machine instance state

Problem

The nightly Slow Hypothesis CI has been failing:
https://github.com/pydata/xarray/actions/runs/19603404994

The test properties/test_index_manipulation.py::DatasetTest::runTest fails intermittently with:
hypothesis.errors.FlakyStrategyDefinition: Inconsistent data generation!
Data generation behaved differently between different runs.
Is your data generation depending on external state?

Root Cause

The original code used a pattern from [Stack Overflow](https://stackoverflow.com/questions/73737073/c
reate-hypothesis-strategy-that-returns-unique-values):

@st.composite
def unique(draw, strategy):
    seen = draw(st.shared(st.builds(set), key="key-for-unique-elems"))
    return draw(
        strategy.filter(lambda x: x not in seen).map(lambda x: seen.add(x) or x)
    )

While this pattern works for regular @given tests, it does not work for RuleBasedStateMachine
because:

  • The shared set's contents depend on which rules ran before
  • During shrinking, Hypothesis replays with different rule orderings
  • This causes different names to be drawn, breaking determinism

Per HypothesisWorks/hypothesis#2338:
"I think you'll need to set these up as instance attributes, rather than class attributes."

Solution

Move uniqueness tracking to state machine instance state:

  • Added self.used_names: set[str] (reset per test case via init)
  • Added _draw_unique_name() that filters and tracks used names
  • Updated all rules to use data=st.data() and call helper methods

This follows the Hypothesis-recommended pattern for stateful tests where cross-rule state belongs on
the instance.

The original approach used st.shared(st.builds(set), ...) with mutation to
track unique names. While this pattern works for regular @given tests, it
doesn't work correctly in RuleBasedStateMachine because the shared set's
contents depend on which rules ran before, causing inconsistent replay
during shrinking.

Per Hypothesis maintainer guidance, moved uniqueness tracking to instance
state (self.used_names) which is properly isolated per test case.

Co-authored-by: Claude <noreply@anthropic.com>
@github-actions github-actions bot added the topic-hypothesis Strategies or tests using the hypothesis library label Nov 23, 2025
Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I looked at it a couple of months ago and couldn't figure out what was going on

@dcherian dcherian added the run-slow-hypothesis Run slow hypothesis tests label Nov 24, 2025
@max-sixty max-sixty merged commit 42669f9 into pydata:main Nov 24, 2025
62 of 63 checks passed
@max-sixty max-sixty deleted the ci branch November 24, 2025 07:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-slow-hypothesis Run slow hypothesis tests topic-hypothesis Strategies or tests using the hypothesis library

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants