Skip to content

fix: use robust pseudo p-value for two-sided significance#514

Open
madhavcodez wants to merge 1 commit into
pysal:mainfrom
madhavcodez:fix/degenerate-two-sided-pvalue
Open

fix: use robust pseudo p-value for two-sided significance#514
madhavcodez wants to merge 1 commit into
pysal:mainfrom
madhavcodez:fix/degenerate-two-sided-pvalue

Conversation

@madhavcodez

Copy link
Copy Markdown

Title

fix: use robust pseudo p-value for two-sided significance

Summary

The two-sided branch of _permutation_significance in esda/significance.py
derived the p-value from percentiles of the reference distribution. When the
conditional reference distribution is constant (every permuted value is
identical), those percentiles collapse to a single value, so both the lower and
upper tail counts include the entire reference distribution. The resulting count
exceeds p_permutations, and the p-value can exceed one.

>>> import numpy as np
>>> from esda.significance import calculate_significance
>>> calculate_significance(5.0, np.full((1, 19), 5.0), alternative="two-sided")
1.95

This replaces the percentile approach with the equivalent robust pseudo
p-value suggested in the issue:

2 * (min(greater, lesser) + 1) / (permutations + 1)

clipped at one. The result stays in (0, 1] for degenerate nulls and matches
the existing one-sided counting conventions used by the greater/lesser
branches, so the directed <= two-sided invariant still holds.

Changes

  • esda/significance.py: two-sided branch now uses the clipped pseudo p-value.
  • esda/tests/test_significance.py: regression tests covering
    • the degenerate constant null with the statistic on the constant, asserting
      the result is exactly 1.0 for both the scalar and vector inputs;
    • the second failure mode of the old formula, a constant null with the
      statistic off the constant, asserting the pseudo p-value 0.1;
    • a seeded normal null, asserting the exact pseudo p-value 0.016;
    • the two-sided == 2 * directed identity on a one-sided statistic.

Testing

  • pytest esda/tests/test_significance.py passes (10 tests), including the
    existing test_execution_and_range and test_alternative_relationships.
  • Each new assertion fails on the unpatched source (the degenerate scalar case
    reports 1.95) and passes with the fix.
  • Broad regression on the consumers of the function
    (test_moran.py, test_moran_local_mv.py) passes.

Closes #504

The two-sided alternative derived the p-value from percentiles of the
reference distribution. When the conditional reference distribution is
constant, those percentiles collapse to a single value, so both the lower
and upper tail counts include every permutation. The resulting count
exceeds p_permutations and the p-value can exceed one (e.g. 1.95 for a
constant 19-permutation null).

Replace the percentile approach with the equivalent robust pseudo
p-value, 2 * (min(greater, lesser) + 1) / (permutations + 1), clipped at
one. This keeps the directed (one-sided) p-value no larger than the
two-sided value and stays bounded for degenerate nulls.

Closes pysal#504
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Degenerate nulls confound significance calculation.

1 participant