Audit Test Suite #4966

jbrockmendel · 2018-08-23T01:49:54Z

There is a lot of heterogeneity in the quality of the tests. An audit-like process should attempt to identify/address (in no particular order):

What parts of the code have only smoke tests?
What parts of the tests are not getting run?
- Commented-out
- mangled names
- miscellany likedef junk in discrete.tests.test_constrained
- stranded in __main__ sections
- incorrectly located in __main__ sections of non-test files.
Are there any xfailed tests that have been fixed? Or can be marked with strict=True?
A lot of effort went into creating the results files to compare against (props to our sm forebearers). Are these reproducible? (and if not, can they be made reproducible?)
In some cases results were subsequently "hand-edited" for various reasons. Are these well-documented?
Some "example" files have snuck into test directories; where should they go?
Can test runtime be significantly reduced by efficient use of pytest.fixtures?
Are there combinations of parameters that can be tested more thoroughly using pytest.mark.parametrize?
Can the tests be otherwise be made less verbose/clearer?
Other modernizations that should be made? e.g. IIUC assert_almost_equal is discouraged and assert_allclose should be used instead.
Are there places where assert_allclose tolerances can be reduced?
grep turns up 5 occurrences of "FIXME" and 218 occurrences of "TODO" in test directories
There are a whole bunch of occurrences of things like:

        cls.res1 = mymodel.fit(method="lbfgs", disp=0, maxiter=50000,
                #m=12, pgtol=1e-7, factr=1e3, # 5 failures
                #m=20, pgtol=1e-8, factr=1e2, # 3 failures
                #m=30, pgtol=1e-9, factr=1e1, # 1 failure
                m=40, pgtol=1e-10, factr=5e0,
                loglike_and_score=mymodel.loglike_and_score)

        get_robustcov_results(cls.res1._results, 'cluster',
                                                  groups=group,
                                                  use_correction=True,
                                                  df_correction=True,  #TODO has no effect
                                                  use_t=False, #True,
                                                  use_self=True)

        model = sm.Logit(y_bin, x)  #, exposure=np.ones(nobs), offset=np.zeros(nobs)) #bug with default

The TODO is reasonably clear and helpful, but the commented-out True and the commented-out fit parameters are not helpful in their current form.

This is a pretty huge task. A few steps in this direction: #4941, #4936, #4932, #4907, #4875, #4863, #4506, #4488, #4305

The text was updated successfully, but these errors were encountered:

jbrockmendel mentioned this issue Sep 1, 2018

ENH: more work on copula (deriv, classes) #5076

Merged

jbrockmendel mentioned this issue Sep 8, 2018

SUMM: Roundup of mangled, commented-out, docstringed-out tests #5145

Closed

49 tasks

This was referenced Sep 20, 2018

TST: mark smoketests #5232

Merged

is sm.open_help ever defined? #5134

Closed

jbrockmendel mentioned this issue May 12, 2019

Issue Labels #5708

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audit Test Suite #4966

Audit Test Suite #4966

jbrockmendel commented Aug 23, 2018

Audit Test Suite #4966

Audit Test Suite #4966

Comments

jbrockmendel commented Aug 23, 2018