Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: TestMICE.test_combine, test_corrpsd_threshold[0], test_mixedlm failing on Debian unstable #7911

Open
rebecca-palmer opened this issue Nov 27, 2021 · 2 comments

Comments

@rebecca-palmer
Copy link
Contributor

Describe the bug

In Debian unstable, TestMICE.test_combine, test_corrpsd_threshold[0] and test_mixedlm are failing. (This log is from statsmodels 0.12.2, but 0.13.1 has the same errors; I haven't tried current main.)

As the output of test_corrpsd_threshold is so close to 0, and the results of TestMice.test_combine and test_mixedlm depend substantially on the np.random state, I suspect that this is a rounding issue not a real incorrect-results bug, but I don't have proof of that.

=================================== FAILURES ===================================
____________________________ TestMICE.test_combine _____________________________

self = <statsmodels.imputation.tests.test_mice.TestMICE object at 0x7f946cec9070>

@pytest.mark.slow
def test_combine(self):

    np.random.seed(3897)
    x1 = np.random.normal(size=300)
    x2 = np.random.normal(size=300)
    y = x1 + x2 + np.random.normal(size=300)
    x1[0:100] = np.nan
    x2[250:] = np.nan
    df = pd.DataFrame({"x1": x1, "x2": x2, "y": y})
    idata = mice.MICEData(df)
    mi = mice.MICE("y ~ x1 + x2", sm.OLS, idata, n_skip=20)
    result = mi.fit(10, 20)

    fmi = np.asarray([0.1778143, 0.11057262, 0.29626521])
  assert_allclose(result.frac_miss_info, fmi, atol=1e-5)

E AssertionError:
E Not equal to tolerance rtol=1e-07, atol=1e-05
E
E Mismatched elements: 3 / 3 (100%)
E Max absolute difference: 0.17686937
E Max relative difference: 1.59957657
E x: array([0.230217, 0.287442, 0.322124])
E y: array([0.177814, 0.110573, 0.296265])

/usr/lib/python3/dist-packages/statsmodels/imputation/tests/test_mice.py:366: AssertionError
__________________________ test_corrpsd_threshold[0] ___________________________

threshold = 0

@pytest.mark.parametrize('threshold', [0, 1e-15, 1e-10, 1e-6])
def test_corrpsd_threshold(threshold):
    x = np.array([[1, -0.9, -0.9], [-0.9, 1, -0.9], [-0.9, -0.9, 1]])

    y = corr_nearest(x, n_fact=100, threshold=threshold)
    evals = np.linalg.eigvalsh(y)
  assert_allclose(evals[0], threshold, rtol=1e-6, atol=1e-15)

E AssertionError:
E Not equal to tolerance rtol=1e-06, atol=1e-15
E
E Mismatched elements: 1 / 1 (100%)
E Max absolute difference: 1.05471187e-15
E Max relative difference: inf
E x: array(1.054712e-15)
E y: array(0)

/usr/lib/python3/dist-packages/statsmodels/stats/tests/test_corrpsd.py:196: AssertionError
_________________________________ test_mixedlm _________________________________

def test_mixedlm():

    np.random.seed(3424)

    n = 200

    # The exposure (not time varying)
    x = np.random.normal(size=n)
    xv = np.outer(x, np.ones(3))

    # The mediator (with random intercept)
    mx = np.asarray([4., 4, 1])
    mx /= np.sqrt(np.sum(mx**2))
    med = mx[0] * np.outer(x, np.ones(3))
    med += mx[1] * np.outer(np.random.normal(size=n), np.ones(3))
    med += mx[2] * np.random.normal(size=(n, 3))

    # The outcome (exposure and mediator effects)
    ey = np.outer(x, np.r_[0, 0.5, 1]) + med

    # Random structure of the outcome (random intercept and slope)
    ex = np.asarray([5., 2, 2])
    ex /= np.sqrt(np.sum(ex**2))
    e = ex[0] * np.outer(np.random.normal(size=n), np.ones(3))
    e += ex[1] * np.outer(np.random.normal(size=n), np.r_[-1, 0, 1])
    e += ex[2] * np.random.normal(size=(n, 3))
    y = ey + e

    # Group membership
    idx = np.outer(np.arange(n), np.ones(3))

    # Time
    tim = np.outer(np.ones(n), np.r_[-1, 0, 1])

    df = pd.DataFrame({"y": y.flatten(), "x": xv.flatten(),
                       "id": idx.flatten(), "time": tim.flatten(),
                       "med": med.flatten()})

    mediator_model = sm.MixedLM.from_formula("med ~ x", groups="id", data=df)
    outcome_model = sm.MixedLM.from_formula("y ~ med + x", groups="id", data=df)
    me = Mediation(outcome_model, mediator_model, "x", "med")
    mr = me.fit(n_rep=2)
    st = mr.summary()
    pm = st.loc["Prop. mediated (average)", "Estimate"]
  assert_allclose(pm, 0.52, rtol=1e-2, atol=1e-2)

E AssertionError:
E Not equal to tolerance rtol=0.01, atol=0.01
E
E Mismatched elements: 1 / 1 (100%)
E Max absolute difference: 0.01958632
E Max relative difference: 0.03766599
E x: array(0.539586)
E y: array(0.52)

/usr/lib/python3/dist-packages/statsmodels/stats/tests/test_mediation.py:214: AssertionError

Code Sample, a copy-pastable example if possible

The statsmodels test suite.

Expected Output

The tests should pass.

Output of import statsmodels.api as sm; sm.show_versions()

The problem started when Debian upgraded from libblas3/liblapack3 3.9 to 3.10.

Python 3.9, numpy 1.19, scipy 1.7, matplotlib 3.3, pandas 1.1.

@josef-pkt
Copy link
Member

the solution for corrpsd is most likely #3716 or something like that.

I don't know the code well enough for the other two to have a guess how fragile they are or why.

@ArchangeGabriel
Copy link

FWIW, I have the three same failures on an up-to-date ArchLinux for 0.13.2:

____________________________ TestMICE.test_combine _____________________________

self = <statsmodels.imputation.tests.test_mice.TestMICE object at 0x7f21c0ea6800>

    @pytest.mark.slow
    def test_combine(self):
    
        np.random.seed(3897)
        x1 = np.random.normal(size=300)
        x2 = np.random.normal(size=300)
        y = x1 + x2 + np.random.normal(size=300)
        x1[0:100] = np.nan
        x2[250:] = np.nan
        df = pd.DataFrame({"x1": x1, "x2": x2, "y": y})
        idata = mice.MICEData(df)
        mi = mice.MICE("y ~ x1 + x2", sm.OLS, idata, n_skip=20)
        result = mi.fit(10, 20)
    
        fmi = np.asarray([0.1778143, 0.11057262, 0.29626521])
>       assert_allclose(result.frac_miss_info, fmi, atol=1e-5)
E       AssertionError: 
E       Not equal to tolerance rtol=1e-07, atol=1e-05
E       
E       Mismatched elements: 3 / 3 (100%)
E       Max absolute difference: 0.17686937
E       Max relative difference: 1.59957657
E        x: array([0.230217, 0.287442, 0.322124])
E        y: array([0.177814, 0.110573, 0.296265])

statsmodels/imputation/tests/test_mice.py:366: AssertionError
__________________________ test_corrpsd_threshold[0] ___________________________

threshold = 0

    @pytest.mark.parametrize('threshold', [0, 1e-15, 1e-10, 1e-6])
    def test_corrpsd_threshold(threshold):
        x = np.array([[1, -0.9, -0.9], [-0.9, 1, -0.9], [-0.9, -0.9, 1]])
    
        y = corr_nearest(x, n_fact=100, threshold=threshold)
        evals = np.linalg.eigvalsh(y)
>       assert_allclose(evals[0], threshold, rtol=1e-6, atol=1e-15)
E       AssertionError: 
E       Not equal to tolerance rtol=1e-06, atol=1e-15
E       
E       Mismatched elements: 1 / 1 (100%)
E       Max absolute difference: 1.05471187e-15
E       Max relative difference: inf
E        x: array(1.054712e-15)
E        y: array(0)

statsmodels/stats/tests/test_corrpsd.py:196: AssertionError
_________________________________ test_mixedlm _________________________________

    def test_mixedlm():
    
        np.random.seed(3424)
    
        n = 200
    
        # The exposure (not time varying)
        x = np.random.normal(size=n)
        xv = np.outer(x, np.ones(3))
    
        # The mediator (with random intercept)
        mx = np.asarray([4., 4, 1])
        mx /= np.sqrt(np.sum(mx**2))
        med = mx[0] * np.outer(x, np.ones(3))
        med += mx[1] * np.outer(np.random.normal(size=n), np.ones(3))
        med += mx[2] * np.random.normal(size=(n, 3))
    
        # The outcome (exposure and mediator effects)
        ey = np.outer(x, np.r_[0, 0.5, 1]) + med
    
        # Random structure of the outcome (random intercept and slope)
        ex = np.asarray([5., 2, 2])
        ex /= np.sqrt(np.sum(ex**2))
        e = ex[0] * np.outer(np.random.normal(size=n), np.ones(3))
        e += ex[1] * np.outer(np.random.normal(size=n), np.r_[-1, 0, 1])
        e += ex[2] * np.random.normal(size=(n, 3))
        y = ey + e
    
        # Group membership
        idx = np.outer(np.arange(n), np.ones(3))
    
        # Time
        tim = np.outer(np.ones(n), np.r_[-1, 0, 1])
    
        df = pd.DataFrame({"y": y.flatten(), "x": xv.flatten(),
                           "id": idx.flatten(), "time": tim.flatten(),
                           "med": med.flatten()})
    
        mediator_model = sm.MixedLM.from_formula("med ~ x", groups="id", data=df)
        outcome_model = sm.MixedLM.from_formula("y ~ med + x", groups="id", data=df)
        me = Mediation(outcome_model, mediator_model, "x", "med")
        mr = me.fit(n_rep=2)
        st = mr.summary()
        pm = st.loc["Prop. mediated (average)", "Estimate"]
>       assert_allclose(pm, 0.52, rtol=1e-2, atol=1e-2)
E       AssertionError: 
E       Not equal to tolerance rtol=0.01, atol=0.01
E       
E       Mismatched elements: 1 / 1 (100%)
E       Max absolute difference: 0.01958632
E       Max relative difference: 0.03766599
E        x: array(0.539586)
E        y: array(0.52)

statsmodels/stats/tests/test_mediation.py:214: AssertionError

The corrpsd tests are emitting a warning:

stats/tests/test_corrpsd.py::TestCovPSD::test_cov_nearest
stats/tests/test_corrpsd.py::TestCorrPSD1::test_nearest
stats/tests/test_corrpsd.py::test_corrpsd_threshold[0]
stats/tests/test_corrpsd.py::test_corrpsd_threshold[1e-15]
stats/tests/test_corrpsd.py::test_corrpsd_threshold[1e-10]
stats/tests/test_corrpsd.py::test_corrpsd_threshold[1e-06]
  /build/python-statsmodels/src/statsmodels-0.13.2/build/lib.linux-x86_64-3.10/statsmodels/stats/correlation_tools.py:90: IterationLimitWarning: 
  Maximum iteration reached.
  
    warnings.warn(iteration_limit_doc, IterationLimitWarning)

But I have no idea if that’s related.

No warnings for the two others (but 285 warnings in total).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants